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1. INTRODUCTION 


The increasing ctemands of speed and performance in modern sign 
imao^ processing applications r^cessiiate a revolutionary super comP 
technolosu- Sequential systems will be inadecM*^ future real-time proces 
systems, and the tremendous computational capability of the array processing 
become a necessity. In most real-time digital signal processing applic 
general purpose parallel computers cannot offer enough (arocessing spee 
severe system overheads. Therefore, speK5ial'P^*'P°®® array processors 
become the only appealing alternative. 


1,1 jVtCHITECTURES : 

Parallel computers can be divided into three architectural configure 
[Hwang84] 


-pipeline computers or vector processors. 


-Multiprocessor systems. 


-Array processors. 


A pipeline computer performs overlapped computations to exploit 'tempof' 

'asynchronous parallelism' through' 


parallelism'. A multiprocessor system achieves 


set of interactive processcrs 


with shared resources (memories, etc.). An ai 


rra^ 


processor uses multiple synchronized processors to achieve spatial para 


The first l»««j classes belong to the general-purpose computer doma' 


Th* 


deivelopment of these systems requires a complicated desi®^ cf control un‘ 
optimized schemes for the allocation of machine resources. 


an< 



The last class of computers offers a promising solution to meet real-time 
processing requirements. In particular, locally intercormected connsuting net»«3rks, 
such as systolic and wavefront arrays, are well suited to efficiently ifw>lement a 
major class of signal processing algorithms, due to their massive parallelism and 
regular data flow CHTkungSZl. Therefore, we will focus on, the systolic 
architectcH^es. 

1 Z SYSTOUC ARCHUECTimES: 

In a systolic system, data flows from the computer memory in a rhythmic 
fashion, passing through many processing elements before it returns to memory, 
much as blood circulates to and from the heart CHTKung823. The array can be 
linear rectangular, or hexagonal to make use of higher degrees of parallelism. 

Computational tasks can be classified into two families : Compute-bound 
computations and I/O-bound computations EHTKungSZl. In a computation, if the total 
number of operations is larger than the total number of input and output 
operations, then the computation is compute-bound, otherwise it is I/O-bound. For 
example, ordinary matrix multiplication is compute-bound, whereas adding two 
matrices is I/O-bound. Speeding tv> the I/O-bound computations requires an increase 
in memory bandwidth, which is limited by the current technologies. Speeding up a 
compute-bound computation, I-KMMever, can be accomplished using systolic arrays. 
The basic configuration of a systolic array is as shown in fig. i.l. By replacing a 
single processor by a 1-D or 2-D array of processing elements <PE), a higher 
computation throu^tfiput can be achieved without increasing memory bandwidth. 




Fig. 1.1 Basic configuration of Systolic Arrays. 

12.1 Definition of Systolic Arrays : 

A systolic array is a computing network possessing the following features : 
CHTKungBZl, CSYKung873. 

-Synchrony : The data are rhythmically computed (timed by a global clock) 
and passed through the network. 

-Hodularity and regitLarity : The array consists of modular processing units 
with homogeneous interconnections. htoreoviBr^ the computing network may be 
extencted indefinitely. 

-Spatial locality and temporal locality : The array manifests a locally - 
communicative interconnection structure, i.e. spatial locality. There is at least one 
unit-time delay allocated so that signal transactions from one node to the next 
node can be completed, i.e. temporal locality. 

•Piptiiinability : (i.e., CKM) execution-time speedup). The array exhibits a 
linear rate pipelinability, i.e.. It should achieve an OOi) speedup, in terms of 
processing rate, where M is the ncrntber of processing elements (PEs). 













1jZ2 Mhy Systolic Architectures ? 


The ma^r factors because of which, systolic arrays are useful for special- 
pu^pose processing architectures are : simple and regular desisr»i concurrency and 
communication, and balancing computation with I/O [HTKungSZl. 

SimpHe and Regular Design : By using a regular and simple design, great 
saving in design cost can be achieved. Furthermore, simple and regular systems 
are likely to be modular and therefore adjustable to various performance goals. 

Concurrency and Communicatian : There are essentially two ways to build a 
fast computer system. One is to use fast components, and other is to use 
concurrency. Since the technological trend clearly indicates a diminishing growth 
rate for component speeds, for major improvement in the speed, concurrent use of 
many processing element is essential, i^hen large number of processors work 
together, communication becomes significant. In VLSI technology, rcHjting costs 
dcHninate the power, time, and area required to implenent computation; therefore, 
regular and local communication in systolic arrays is advantageous. 

Balancing Computation with I/D : The ultimate performance goal of an array 
processor system is a computation rate that balances the available I/O bandwidth 
with the host. With the relatively low bancb«iidth of current 1/0 devices, to achieve 
a faster computation rate, it is necessary to perform multiple computation per I/O 
access. However, the repetitive use of a data item requires it to be stored inside 
the system for sufficient length of time. In other words, the I/O prcrfslem influences 
not only Oe required 1/0 bandwidth but also the required internal memory. The 
question then is how to arrange a computation together with an appropriate memory 
structure and 1/0 bandwidth, so that computation time is balanced with I/O time. 


The 1/0 problem becomes especially severe wheui a large computation is 



performed on a small array. In this case, the computation must be decomposed 
(partitioning problem). In practice, this is often the case, and therefore, questions 
such as how a computation can be decomposed to minimize I/O, and how buffer 
memory can be arranged to minimize I/O are critical to the practical design of an 
array processor system. 

A solution to above challenges is the systolic array processing. A systolic 
system consists of a set of interconnected simple cells. Information in a systolic 
system flows between cells in a pipelined fashion, and communication with the 
outside world occurs only at the "boundary cells". Thus computation rate of the 
system can be balanced with available I/O bandwidth with the host. 

13 OBJECTH^E jWD scope of ihe current UORK : 

In this thesis an attempt has been made to develcwj a simulator for a linear 
systolic array processor for developing and executing a class of signal 
processing algorithms. The objectives of the current work are described below. 
i> To define a systolic array architecture, which will work as an attached 
processor to an external host. 

2> To develop a simulator for this architecture using ADSP i4XX and 32XX chip 
set. 

3> To develop a generalized, redefinable microassembler (meta-asseniiler) , which 
is used for developing programs for the simulator. 

4> To test the simulator and meta-assembler through execution of some simple 


systolic algorithms. 




1.4 ORGANIZATION OF THE REPORT : 


Chapter 2 presents an overall view of the systolic array signal processor 
<SASP) , which has been simulated. Me follow the array confi^Mration proposed by 
Nemawarkar [NEM883. A simplified version of the SASP system harcfc«re has been 
developed in earlier thesis and described in {!Usman893 and ESAM833. Several 
additional details have been incorporated in the simulator described in this thesis 
, thou^ some of these features are not included in the present harch^tare 
development , under progress ESubramanianSOl, CSheraSOl. 

Chapter 3 details the design and implementation aspects of the SASP. The 
chapter discusses the design of processing element (cell) and the interface unit 

aPU). 

(Chapter 4 gives the simulator details for the SASP system. The methodology 
ackspted for the simulation is given. 

Chapter 5 discusses the algorithms for matrix multiplication and convolution 
on the SASP system. Verification of the meta-assembler and simulator is done using 
these prosrams. 

Oiapter 6 describes the generalized, redefinable microassembler (meta- 
assembler) , which can be used for any microprogrammed architecture. The 
facilities provided in the meta-assemler and its operation are also given. 

(Chapter 7 gives the conclusions and lists few suggestiors for future work. 

The manual for the meta-assembler , manual for the Simulator , the definition 
files for the microcode of the interface unit and the cell and the manual for 
AD^-i40i(sequencer), ADSP-14iO (acfciress generator) , ADSP-;^10 (Multiplier) and 
ADSP-3220 (ALU) chips are given in the Appendices A, B, C and D respectively. 




2. SYSTEM OVERVIEW 


2.1 INTRODUCTION:- 

In this chapter, the Systolic Array Signal Processor (SASP) has been 
introduced. It is a systolic array computer of linearly connected cells, each of 
which is a micropwx)grainmable processor capable of performing floating-point 
operations. It is ctesigned for computation-intensive applications. 

The architecture is similar to the MARP computer developed at Carnegie 
Mellon [Anna871. The system has been simulated on a PC , as well as on a i-P-9000 
system and the working is tested by develc^ing some algorithms. Some utilities <e.g. 
Assembler ) have been used in the development of system harcNare. 

In this chapter the architecture of SASP systan arwl its main features are 
discussed. 

22 ARCHTTECTWE: 

The SASP system is attached to a general purpose host PC/XT through a 
parallel interface. The system , thus consists of three major parts - the Host, 
interface unit (IFU) and the processor array, fig. 2.1 

i> HOST :- The host suviplies data to the array and receives the results 
from the array. It programs the array to execute algorithms. In additicn, it 
executes tirose parts of algorithms, vi^ic^ canmt be mapped onto the array. 

2> biterface Unit ;- The interface unit handles the inpwt/output 

cwieration between the array and the host, and it generates addresses (Adcb') and 
control sisrtals for the processor array. It also regulates the flow of the 
interiMdiate results through the i:H''ocessor array. 


3> PrciC!essor-ArT*ay The main coRMXiting power of the SASP is from the 



Cntrl. 



Fig. 2.1 System Overview. 
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processor array. The array consists of identical cells (processors). Each cell is a 
programmable horizontal microengine, with its own sequencer, address generator, 
data memory and program memory. The microcode for the cell can be downloaded fay 
the host ttvxjugh the broacteast bus (BC-bus). 

The cells are connected through inter-cell communication channels (X, Y and 
Addr ). Each cell can communicate with its neighbours Qeft aral rig^t). The 32 bit 
data flows throus^i th« array on X and Y channels. The adcfcresses for the local 
memories of the cell, generated by IFU, propagate down the Addr channel. Moreover, 
the cell has a local actdress generator, which helps in efficient loop realizations. 
The direction of Y channel can be reconfigured at any time, through microprogram 
bits. This feature can be used, for example, in algorithms that require accumulated 
results in the last cell to be sent back to the other cells, or require local 
exchange of data between adjacent cells. 

23 TNTFI?CFtl CQMMLINIC>CnON: 

In the architecture of the SASP machine the sAcrf^al commuBnication is only 
thff'oustfi the broadcast bus (BC-bus). The BC-bus is used only for the 
loading/reading of the microprogram and the data. But dtr'ing the execution of 
algorithms only local communication channels are used. A Queue is associated with 
each channel (XQ ,YQ, and AddrQ) and is placed in the data path of the irvHJt cell. 
Use of queues greatly ertiances the intercell bancb^idth. 

The 'flow control' for the communication channel described below, has been 
assumed to be inolemented in the hardware. I^frien a cell tries to read from an 
empty queue, it is blocked (i.e. Cycles are skipped) until data item arrives. 
Similarly, I'^ien a cell tries to write into a queue of a neighbouring cell fr#ken it is 


full , the ixff'iting cell is blocked until data is removed from tte full queue. The 
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blocking of a cell is transparent to the program. Only the cell that tries to read 
from an ermoty queue or to deposit a data item into a full queue is blocked. All 
other cells in the array continue to operate normally. The data queues of a 
blocked cell are still able to accept input; otherwise, a cell blocked on an empty 
queue will never become unblocked. 

The channels are described below: 

The X channel is a 32-bit wide data path and is unidirectional. It starts 
from IFU and ends on the last cell. The data on which computation is to be done is 
transmitted over this channel and it ripples through the cells without being 
modified. 

The Y channel is also 32-bit wide. It is bidirectional and its direction can 
be statically reconfigured by the algorithm to be executed. This channel forms a 
closed loop, starting from IFU, rmning through the processor array and ending at 
the IFU. Intermediate or final results travel on this channel. At the end of each 
pass through the processor array, Y channel terminates into YA and YB memories 
in the Interface L»iit. Each cell can communicate with both of its neighbours 
through this channel. 

The Addr channel provides addresses for local data memories in the cells. 
Address is generated in the interface unit and transferred along with data on the 
X-channel. 

The Cntrl channel contains control signals to read from or write into 
queues and the status of the queues of the neighbouring cells. 

Signals coming to the cell n from cell n+1 are as follows. 

Xqfull To indicate X queue of cell n+i is full. 


Yqfull 


To indicate Y queue of cell n+i is full. 



Addrqfull To indicate Acklr queue of cell n+1 is full, 
wryq To write data into Y queue of cell n. 

Signals going from cell n to cell n+i are as follows, 
wrxqil To write data into X queue of cell m-i. 

wryq* To write data into Y queue of cell n+i. 

wraddrq* To write address into Achik' queue of cell n+1. 
yqfull To indicate Y CMeue of cell n is full. 

The signal 'wryq' coming from the cell n+1 is considered only I'rfien Y bus 
direction is reversed. 

The IFU is considered as cell D. 

For cell N sig^ls going to the IFU are 
wryq* To write into YB memory in the FU. 

wrya* To w'ite into YA memory in the FU. 

Z.4 BWQyflJC/IST BOS (BC-itus?: 

The broadcast bus is used by the host to load data and microprograms into 
each cell. Its signals are as follows. 

1. Data lines - For loading/reading microcode and data memory of each cell. 

2. Cell acklress - The addressed cell communicates with the host through 
data lines. 

3. Reset • Systen reset. 

4. Y bus direction - This signal originates from the microcode memory of 
FU. It decides the Y-bus direction through the array. 

5. Read, write and handshake signals from/to host interface for reading and 
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writing into the data memories and the microprogram memories of the array cells. 

6. Fig - Flag input to all the sequencers in the system. It can be used to 
indicate start of execution or end of execution of a program. Any cell can raise 
this signal. 

7. ttes - Assertion of the signal forces the sequencer (ADSP-1401) in a cell 
to execute WCS instrtKJtion. This instruction is used for loading the microcode in 
microcode memw'y. 

8. CLK - This is global clock broadcasted over the BC bus to be used by all 
the cells. 


THE cell mo: 1 

The block diagram of a cell is shown in fig. 22 . Each cell has its own 
program sequencer and has the data path as shown in figure. The following are the 
major features of the cell unit data path: 

ALU vid The multiplier and ALU are implemented with commercial 3Z-bit 
floating-point multiplier ADSP-3210 and floating-point ALU ADS^-3220 chips, 
respectively. These chips use pipeline mode to achieve the maximum possible 
throughput. That is, a chip starts a new 32-bit arithmetic operation every cycle, 
although the result of an operation will not emerge from the chip^s output port 
until few cycles after the operation starts. Thus the processor array si^jports 
pipelining at toth the array and the cell levels. These two levels of pipelining 
greatly enhance the system throughput. 

Reg-file: It contains 128 32-bit wide registers accessible from any of the 5 
ports. Two ports are input ports, two are output ports and one port is 


bidirectional. Register file is implemented using ADSP-3128 chips. 

















X C^JBue, Y queue, and Addr Queue : These cpjeues are provicted mainly to 
ensure that X, Y and Adch' stream are properly svffichronized, as recMired by 
systolic algorithms. Tt« queues are implemented using CY7C40 (512 X 9 FIFO) chips. 

Data memary: Having a memory at each cell for buffering data, implementing 
look-LV 3 tables , or storing intermediate results is essential for rechjcing the I/O 
barvibudth requirensnt of the cells. Also by using its local memory to store 
temporary data, a cell can be multiplexed to implement the functions of multiple 
cells in systolic array design. As a result, for example SASP can implement 
algorithmis designed for two dimensional systolic arrays or one dimensional arrays 
that have more cells than the one dimensional array of the machine. 

Ooss bar : The ALU, (•'toy. Register files and I/O ports of the SASP cell are 
linked by a crossbar. The crossbar can be reconfigured every c^le under the 
control of microcode to allow a read port to get data from any of the six write 
ports. The cross bar is implemented using multiplexers. 

Input multiplexBrs : These are used to implement computations using the 
w'aparound or bidirectional data flow mode. In the le^aparound mocte the outputs of 
the cell is fed back to its inputs, hence wrapping around the cell. This mode 
iiHJltiplexes the use of one cell to implement the function of several. (The same 
effect can also be achieved through the use of other resources, such as the data 
memory J This increases the virtual size of the array for problems requiring 
larger array size. In the bidirectional data flow mode the Y input of eac^ cell can 
take values from the Y output of the next cell, that is, the cell to the ristfit, this 
feat«-r^ allows the SASP array to inv>lement linear systolic array with 
bidirectional data flows. 


TI% detailed description of the system is given in chapter 3. 



3. SYSTEM DESIGN 


In this cihapter the details of the system design are described. The 
functions of each major unit arri the overall block diagraim has already been 
introduced in chapter 2. 

3.1 SASP interface Unit aFUi 

Fig. 3.1 shows the block diagram of the IFU. The interface to host enables 
the host to access different parts of the IFU as well as the cells. A list of 
functions done by tl^ interface mit is given below. 

i > To stqaply X and Y data to the array at recMired rate. 
ii> To route data according to the configMration of the array (Forward 
or reverse ). 

iii) Receiving intermediate results and looping them back into the array. 
iv> Receiving and storing output results. 

V > (Generation of adcAress for address bus. 

vi> Acting as an interface between array and the host. 

Each of these blocks shown in figure 3.1 is described below. 

3.1.1 interf%% to Host : 

This unit contains buffers, decocters and control lines coming from the host. 
The buffers buffer the adcb'ess , data , and caintrol lines from the host. The 
ctecocter selects proper bloc^cs for writing or reading of data. The control lines 
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Control 



V— ^ Data Bus 

« » Control Bus 


Fig. 3.1 Interface Unit Block Dia^am. 










are reacfe, read and write signals. 


Before the start of execution by the array the host chjnps the data onto the 
RAMS of the interface unit and onto the data RAMS of each cell over the broadcast 
bus as described in the section 2.4 (fig 2.1). The code is also dunped onto the 
microcode RAMS of IFU and the cells , in the same way. 

The interface unit takes over control of all the blocks after the host has 
finished loading of microcode and data, and supervises all kinds of routing and 
exchange of data dicing execution of algorithm. After the execution is over, it 
generates an interrtqot , asking the host to take necessary action. Then the host 
may read the results and may process the received results , furtter. 

3.12 Microengine vid Address generator unit : 


The control unit of the interface tnit has been implemented as a 
programmable microengine. It is a standard practice to use microprocessors to 
control the data flow and to perform computations in a 'smart' circuit. But the 
inherent sequential nature of microprocessor operation prevents its use in high 
ttrou^-^jut machines, where more functional parallelism is required. This 
parallelism can be achieved by using microcode approach. The main difference 
between microprocessor circuits and microcode circuits is that the functional 
tnits integrated in a microprocessor are spread out as separate building blocks in 
a microcodted circuit , so that they can operate simultaneously and indei:»ndently. 

In orcter to coordinate the independently operating devices , these 
ftrictional blocks are operated in synchronism with a cxwnmon s^tem clodc. (Control 
instrtKstions/signals for each device are put together in a central microcode 
m«nory and a microir^truction location is accessed in every systan cloedc cycle. 



For suc 5 h a system , it is necessary to have a sophisticated program flow, 
that accocmnodates nested loops, sttoroutines , interrupts, etc. Such demanding 
sequencing tasks are met by using a Program Sequencer ADSP-i40i. Like other 
functional units, the sequencer also gets its instructions from the microcode 
memory and generates address for the next microinstruction depending on the 

current instruction. 

The microcode cteUils for the microengine of the IFU are given in the 
appendix C in the form of a definition file , which is input to the definition phase 
of the meU-assembler , which in turn generates a compact definition file. This 
compact definition file is then used by the simulator and the assembly phase of 
the meta-assembler to run and to assemble the programs for the IFU. 

PROGRAM SEQUENCER: 

During each microinstruction , the ADSP-i401 monitors the conditions and 
instruction to determine the next microprogram address. This address can come 
from one of several souH"'ces : the internal stack , the jump eddress space in the 
internal RAM, the data port , the interrupt vectors, or the program counter. The 
deUiled ctescription of the chip is given in the appendix D. 

The external flag input to sewjencer chip may be used to control 
conditional instructions. Two instructions make explicit use of FLAG as their 
cmdition (JPCOF and JPOT ), while others employ a conditional mode selection 
OMXMJITIONAL, bWTFLAG, FLAG or SIGN > to be specified as part of their opcode. 
Here the FLAG input has the manber of sources,. Oie source can be selected at a 
time through the control lines to the multiplexer from the microcode Xfig.3.23. For 
the IFU th«'e are 4 socr'ces , 'end_c®' -indicating end of the execution of an 
algorithm , 'fig* -can be asserted by any cell under exceptional conditions , 'cmpz' - 



CONTRec SIGNALS TO VAR‘OUS UNITS 



Fi^* 3*Z icTo «*T^i nc afid Addyiess (Jenetti'fiovi Um’i^ Foy IFi/ , 












generated by address generator and 'O' -no flag input. No flag input is the default 
choice. 

Note: In all fisM^s, Sigrals , whose names are ending with originate from the 
microcode memory. 

ADDRESS GENERATOR : 

The interface unit QFU) has to generate the acklress on the addr-bus i^ich 
flows systolically from cell to cell. It also has to address the data memory in the 
FU. The ADSP-1410 address generator has been used for flexible adchress 
generation. This device rapidly generates the data memory adchresses required by 
routines stKSh as digital filters , FFTs , matrix multiplications and DMAs. Circular 
buffers and modulo addressing for data memories can be implemented without 
overhead. In a single instruction the device can : 

- Output a iS-bit memory acWress; 

- modify this memory address ; and; 

- detect when the address value has moved to or beyond a pre-set 
boundary and conditionally loop back to the top of a circular buffer. 

The details of the chip are given in the appendix D. 

3.12.1 Data transfer between sequencer and address generator: 

The data transfer between sequencer and address generator can be used 
for saving the 1410 registers on the 1401's subroutine stack ckring a context 
switch or sutoroutirw call. In acklition, it allows use of the 1410 as an ALU for 
program addresses. For instance , if this system were performing a FFT, the 1401 
would need the shifting function of the 1410's ALU for calculating the number of 



butterflies per group and the number of groups per stage. Both are used by 1401 
for coLTkting loops in the FFT programs. 

The output and ir^Xit arrangement of ADSP-1401 and ADSP-1410 permits data 
to be output during clock HIGH , Nhile irnxjtting of data is performed in clock LCS^I 
phase, thus allowing reading and M'iting of data in single clock cycle. 

The circuit in fig. 32 allows following data transfers in a single clock 

cycle. 

1> constant field to 1401 (kenf =1, den* =dstb* =0). 

2> constant field to 1410 <ken* =1, den* =dstb* =0). 

3> 1401 <-> 1410 (ken* =0, den* =dstb* =1). 

3.13 MEMORY STRUCTURE : 


The interface unit consists of four kinds of memories. The X, YA and YB 
memories are accessible only sequentially. The fourth memory is data RAM. X 
memory is the place for data on which desired ccmputations are to be performed 
by the SASP array. The X memory can also be written from the data memory , when 
the data is too large to fit into the X memory. YB memory is used for intermediate 
results. YA can be used for final or initial results. Details of these memories are 
as follows. 

a> X fflenory Cfig33 3: 

The host can read and write into the X memory. The IFU can read it to 
transf»' X data to cell 1 over the X-bus. and can write it from the data memory, 
idienever the previous data loaded is over. Since memory addressing is through a 
courtter, the memory can be accessed ordy sequentially. Simultaneous read and 



Tniery^l 



arid rncrtioYy for {he. IFV- 











write is not possible. The signals rdx* and wrx* are from the microcode memory. 
The read/write signals from the host are not shown. The signal clrx* clears the 
counter to zero. 

The signal rdx* strobes the X data into X register. To send/or to write this 
data into X queue of cell 1 IFU asserts (through microcode) wrxq* , which is valid 
if the X queue of cell i is not full , and then the signal acts as write signal for 
the X queue of cell 1. Thus the X register data is written into the X cweue of cell 
i. 

b> YA meniory Efig. 3.4 T- 

It can be read and written by the host. The IFU reads the YA memory to 
transfer YA data ( may be initial results ) over the Y bus through the SASP array. 
The cell N or cell i (depending on the Y bus direction ) can write YA memory .(Final 
results.). Since here also the adch^essing is through a counter, memory can be 
accessed only sequentially. Only read or write can be done at a time. Y bus , which 
goes through the SASP array is shared by YB memory as well. Output from the YA 
is strobed into Y register through rdya* . To send the data written into Y 
register to cell i/cell N , the IFU microengine asserts wryq* , which is valid if 
the Y cH^ue of cell i/cell N (depending on the Y bus direction ) is not full, araf 
tt»n acts as write signal for Y queue of cell 1/cell N. Thus the Y register data is 
written into the Y queue of cell 1/cell N. 

c> YB mefflory CFig. 3.4 3: 

This is used for storage of intermediate results and can be accessed by the 
FU and cell 1 or cell N (depending on the Y bus direction.) at the same time. The 
FU can read it to transfer data onto the Y bus and cell N/cell 1 can write into it. 


























WRTCi 

-Terminal count 

WRTC2 

-Terminal count 

RDTCl 

-Terminal count 

RDTC2 

-Terminal couit 



for write counter i. 
for write counter 2. 
for read counter 1. 
for read counter 2. 


Fig. 3.4 b Signal inputs to the EPF04 controller of YB memory. 





The simultaneous read and i«rite operation is achieved by using two RAMSj 
interleaved dynamically. That is, when first RAM was written aral read in previous 
accesses , on simultaneous read and write request , writing is started into the 
second RAM <YB2) and the first RAM is read at the same time, tfrien the data written 
into the first RAM (YBl) gets exausted , the IFU microengine starts reading from 
YB2 and ••r'iting continues in the YB2. And t^r^iting switches to YBi if both read and 
write occur simultaneously. 

Uniting and reading of YB is controlled using EPRDM , flip-flops and gates 
as shown in fig.3.4. 

Two YB RAMS have their own addressing counters, one for read and another 
for write operation. Reading is allowed when data is already written in the memory. 
If the IFU tries to read an empty YB memory , the clock of IFU's sequencer is 
skipped and this skip is lifted only when memory gets a new data item. This 
skipping elongates read YB cycle and microengine effectively waits for data YB to 
arrive from cell N/cell 1. 

Some signals are explained below to show how simultaneous reading and 
writing is achieved. 

WRSJ._S2 and RDS1_S2 are two signals (fig 3.4b), which are LOW when the previous 
write (read) was into (from) YBl, otherwise are HI(3H. 

TCI and T€2 : (Terminal counts for YBl and YB2 ) When the terminal count for the 
write coLTtter of YBl (YB2) is reached TCI (TC2) is set HIGH arwl it is cleared to 
LCMJ when the terminal count for the read counter of YBl (YB2) is reached. 

CMPl and CMP2 : Wwi both the read counter and write counter for YBl (YB2) are 
pointing to the same location OPl (OP2) is made WGH. A comparator is used for 
the comparison of read and write counter values. 

YBiftill ; This is asserted when YBl memory is full. It is enstr'ed when terminal 
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count TCI is high and both read and write counters are equal, i.e. 



Ybifull = cmpi . TCI 

-© 

YBZfuU ; Similarly 

YBfull = cmp2 . TC2 

-© 


YBfiill : This is asserted i^ien both YBi and YB2 memories are full and no more 
data can be written into YB memory. i.e. 

YBfull = YBifull . YB2full = cmpl . TCI . cmp2 . TC2 
YBiempty : This is asserted when YBi memory is empty. It is ensured i^Yen terminal 
count TCi is low and both read and write counters are ecpjal. i.e. 

YBiempty = cmpi . TCi -(J) 

YBZempty ; Similarly YBZempty = cmp2 . TC2 -(^ 

YBemcAy : This is asserted when both YBi and YB2 memories are empty and no more 
data can be read from YB memory. i.e. 

YBempty = YBiempty . YB2empty = cmpi . TCi . cmp2 . TC2 -(^ 

When YB memory is reset both read and write counters point to location 
zero. First write is always in YBi and first read <before any data has been 
written in the memory ) is blocked by skipping of the sequencer's cycles, i^'iting 
continues in YBi until simultaneous read and write occt*'. In the later case , 
inriting is started in YB2 and reading is done from YBi. When the YBi is empty , 
read ct^les are started from YB2 and writing in YB2 continues until simultaneous 
read/write occifl'. 

Following are the conditions when writing switches from YBi to YB2 <YB2 to 

YBi) 

i> Occurrence of simultaneous read and write both reading and 

writing was on in YB1(YB2>. 

OR 


ii> When YBi<YB2) is full. 



Switching from YBl to YB2 <YB2 to YBl) in read cycles takes place when YBl <YB2) 
is em^sty. 

The logic developed is as follows. 


wrybl = wryb . WRS1_S2 . (cmpi.TCi + rdyb . RDS1_S2) 

+ wryb . WRSi_S2 . {cmp2 . TC2 + rdyb . RDS1_S2) -(t) 

Here the term cmpi.TCi indicates , whether the YBi is full or not. 'rdyb' 
indicates simultaneous read. Thus first expression indicates that YBl is written 
i^dien the previous write was in YBi and YBl is not full and , there is no 
simultaneous read in the YBl memory. The second expression irulicaies that YBi is 
written l^^n the previous write was in YB2 and YB2 is full or there is a 
simultaneous read of YB2 memory. 

Similarly 

wryb2 = wryb . WRSi_S2 . (cmp2.TC2 + rdyb . RDSi_S2> 

+ wryb . WRS1_S2 . <cmpl . TCI + rdyb . RDS1_S2> -(?) 

For reading of YB 

rdybl = rdyb . RDS1_S2 . YBlempty + rdyb . RDSi_S2 . YB2empty -0 

Similarly 

rdyb2 = rdyb . RDS1_S2 . YB2empty + rdyb . RDS1_S2 . YBlempty 

IFtfs sequencer's cycle is skipped ifiheri the IFU tries to read empty Yb 

memory. 


The EPf%)M controller implements ecMations 1 to 10. 



d> DATA MEMORY : 


For a large data irwsut , on which computation is to be performed, X memory 
size may not be sufficient. In that case the data can be stored in the data memory 
(fig. 3.3) and when the X memory is empty , X memory can be loaded from the data 
memory. The address for the data memory is generated by the address generator in 
the IFU. 

3JL.4 Run Time Flow Cor^z^l : 


Flow control for the inter cell commLnication channel is assumed to be 
implemented in the hardware. I^B^en a cell tries to read an empty queue, all the 
computations and sequencer's sequencing are blocked until a data item arrives in 
the queue. Similarly when a cell tries to write to a full queue of a neighbouring 
cell , the sender is blocked mtil a data item is removed from the full oweue. The 
blocking of the cells is transparent to the program and hence the programmer need 
not keep track of data flow through the array on cycle by cycle basis, while 
programming. The state of all the computation units on the data path freeze for 
the duration the cell is blocked. All other cells in the array continue to 

operate normally. The data queues of a blocked cell are still able to accept 
inputs. 

To achieve this run time flow control, the following signals are generated. 

a} cximmunicaiion corrizisl signals: Cfig. 3.5] 

Xqfull -> To indicate the IFU that no more data can be written into the X 
queue of cell 1. If the IFU tries to write into X queue , it will be blocked. 

YqfuUl -> To indicate Y cweue of cell i is full. 
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Fig. 3.5 FU - CELL Communication Control sierials. 






YqfullN -> To indicate Y queue of cell N is full. 

One of the above two Yqfull signals is selected by the n»jltiplexer as per the Y 
bus direction. 

Addrqfull -> To indicate address queue of cell i is full, 
wrya -> Write YA memory sisrtal from both cell 1 and cell N . O^e signal 
is selected by the mux. 

wryq from cell -> Write Y c^eue signal from the celll/cellN prfiich acts 
as write YB signal for YB memory in the IFU.. 

wrxq* -> Signal from the IFU microcode to cell 1 to write X memory data 
into X queue of cell 1. 

wryq* from IFU -> Signal from the IFU microcode to cell i/cell N to 
write into Y queue of cell 1/cell N. 

wraddrq* -> Signal from the IFU microcode to cell 1 to write into the 
address queue of cell i. 

ybfull -> Signal to cell 1/cell N to indicate YB memory is full. 

b> CELL - CELL coiMiiunicaiian control signals : (Far cell n) 

The signals coming to the cell n are as follows. 

xqfull -> To indicate that X ca^eue of cell n+1 is full. 
yqfulln+1 -> To indicate that Y cvjeue of cell n+l is full, 
yqfulln-l -> To indicate that Y queue of cell n-1 is full, 
of the above two yqfull signals selected as per Y bus direction. 

addrqfull -> To indicate that Acteiress qteue of cell n+l is full. 
wri«n+l -> M'ite Y queue signal from cell n+i. 
unjcpvi -> Write Y queue signal from cell n-i. 

One of the above two sisrials acts as 'wryq* sisrkal for the present cell's Y queue 



as per Y bus direction. 


wrxq -> Signal from cell n-1 to write into X queue of present cell, 
wraddrq -> Signal from cell n-1 to write into Adch^ess queue of present 

cell. 

There are some 'local' control signals to check the data flow. 

xqempty -> Indicating that X qceue of the present cell is empty and a 
read X queue attempt will block the cell. 

yqeiw 3 ty -> Indicating that Y queue of the present cell is empty, 
adck'qeroty -> Indicating that Address queue of the present cell is 

empty. 

Thus for a cell) cycle is skipped ( cell is blocked) , when 'cycleskip' signal is HIGH , 
where, 

CL^leskip = xqempty . rdxqf + yqempty . rdyqX + addrqempty . rdaddrq* 

+ xqfull . wrxq* + yqfull . wr^* + adck'qfull . wrackfr^* 


3Jl The SASP Can Unit: 


Fig. 3.6 shows ti% block diagram of the SASP cell. Each cell is implemented 
as a prograrmnable horizontal machine , with its own microsequencer and program 
memory. The microinstruction details for the cell unit are given in the Appendix C 
The data cross bar of the cell provides very high intra cell bancb^idth and X , Y 
chanr^ls with their associated queues provide hi^ inta^ cell bandwidth. 

32.1 Hicroen^ne and Address generation unit •' 

The structure of the microengine is similar to the microengine of the IFU. 
Here the FLAG input to the sequencer has the followir^ soi-rces. 





















cmpz , fig as described in the section 3.12 (fig 3.2). 

OVRFLO , UNDFLO , INVALCF -> These are output signals from the ALU 

chip. 

Lessthan , greaterthan, equal -> These signals are derived from the 
output sispials of M-U chip for the compare instruction. 

Data indeperwlent addresses are generated in the FU, whereas data 
dei:endent addresses are generated in the SASP cells. Thus the address generator 
is used as the local address generation unit. 

Addresses to local data memory and scratch pad memory are fed from the 
adcb^ss cross bar , which has inputs from the actelress queue and address port of 
the address generator ADSP-i410. The sequencer ADSP-i40i and address generator 
ADSP-14iO data ports get the data from the data cross bar. Thus the data to the 
data ports can be from the constant (data) field of the microinstruction or from 

any of the inputs to the data cross bar. Since the data ports of sequencer and 

actelress generator are 16-bit wide and data cross bar output is 32-bit , only least 
significjant 16-bits are considered. This feature enables the address geiw'ator 
unit to calcinate data dependent addresses. 

322 Inter cell (kiiHiiijnicatian : 

Each cell cjan commcrac^te with its left and rigrtit neighbota^ thrcxjgh X, Y 
and Acteir bus. A queue is associated with each channel to increase the inter cell 
bandwicith. 

a> X queue -> The input to the X queue can be from the cell n-1 ( x previous > or 
from the output of X caieue itself (x current). This local feedback can be used to 

give ctelay to the X data , »^ch is also useful to simulate multiple cells using 
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single cell. The microcode signal inputs to the X CMeue are , rdxq* - To read X 
queue , wrxq* - signal -from cell n-1 to write tl^ queue , clrx* - to clear read and 
write counters of the queue, retransx* - to retransmit the X queue data from the 
first physical location, (read counter is reset to zero.) 

The details of the implementation of the queue structure is sho«^ in fig. 
3.7. It uses the FIFO CY7C420 . The queue consists of a Dual port RAM . The 
following operations can be performed using the various control signals. 

Resetting the FIFO ; A master reset hff? , causes the FFO to enter the empty 
condition signified by the empty flag (EF) being active. i.e. both read and write 
counters are reset to zero . 

Writing data to the FIFO ■■ The availability of an empty location is indicated by the 
inactive state of the Full flag (FF). A falling edge of write 04) initiates a write 
c^le. 

Reading data from the FIFO ■■ The falling edge of Read ( R ) initiates a read cycle, 
if the empty flag (EF) is not active. The falling edge of R during the last read 
CL^le before the empty conditiwi triggers the EF to active , provibiting any 
further read operation , mtil a valid write. 

Retransmit : The retransmit feature is beneficial when transferring packets of 
data. A low pulse at RT resets the internal read pointer to the first physical 
location of the FFO. The vfl^ite pointer is maffected. 

After a retransmit cycle , previously read data may be reaccessed beginning with 
the first i::^sical location. 

The output sisrwls from tfe FFO CY7C420 EF and FF are used to signal the 
queue conditicNi (envsty or full ). clrx* signal from the microcode memory gives the 
master reset (MR) to the chip and retransx* signal acts as retransmit (RT) signal 


to the c^p. 




Fig. 3.7 Logic Diagram of CY7C420 FIFO 









b> Y queue -> The input to the Y queue can be from the cell n-l/cell n+1 
(depending on the Y bus direction ) i.e. Y previous or from the local feedback. The 
sicrtals to the Y queue are similar to X qi^ue and it is also inolemented using 
CY7C420 FIFO chip. 

c> Addr queue -> The input to the Address queue is from the previous cell (cell n-i) 
and local feedback is rot provided. The structure is same as X queue and Y queue. 

The X register in the X data path (fig. 3.6) is strobed with the X data output 
of X queue by the signal rcbcqX . And when wrxqlE signal is given by the cell n and 
if X queue of cell n+l is not full , this strobed data is written into the X queue 
of cell n+1. 

Similarly the Y register is strobed with the data from the yout cross bar 
output. It is strobed only when the wKJut port of the data cross bar is enabled. The 
data in the Y register is written into the Y queue of cell n-l/cell n+1 , in a way 
similar to X queue. 

The addr register function is same as X register. 

These registers are used because , considering the distance between two 
cell mit boards, the reading of the local queue and (««'iting of the queue of 
neighbouring cell in a single cycle is difficult to realize in th« harcbMare. 

323 Ooss bar : 


Internal data bandwidth is often the bottleneck of a systolic cell. In the cell 
the two floating point chips can consijHne up to four data items and gerorate t(«K) 





results per cycle. Cross bar connecting various data storage blocks si«*)ort this 
high data processing rate. Moreover, the use of cross bar leads to complication 
when conpared to bus based systems. There are 5 input ports, 4 output ports and 
one bidirectional port. An output port can output data from any of the irwjuts , 
irrespective of other outputs. 

The intMjt ports are 
XI -> from X queue. 

YI -> from Y queue. 

constf -> from the constant <data) field of the microcode, 
mresult -> from the multiplier output. 
alu_spout -> from the output. 

The bidirectional port is to/from the data memory. 

The output ports are 
Ain -> to ALU input. 

Bport -> to B port of the register file. 

iout -> to the internal data bus of the microengine .< i6-bit bus ) 
yout -> input to the Y register. 

The cross bar can be realized using PALS or using multiplexers. The input 
port, for an output port is selected using a 3-bit control siortals from the 
microcode. 

For example, for the output port ytxit , to select various inputs, the signals 
trfiich are encoded in 3 microcode bits, are as follows. 
xbyout_xi -> irqxit from XI i.e. X queue. 
xb!:^3ut_yi -> input from Y queue, 
xbyout.cknout -> input from data memory. 


)d}V^3ut_alu^Qout -> input from ALU output bus. 



xbyout_mresult -> input from multiplier output. 

xbyout_constf -> input from the constant (data) field of the microcode, 
xbyouttri -> Tristate the output. 

3.2.4 DATA STORAGE UNITS : 


The local data storage units includes a data memory , a register file and a 
scratcN^ad memory. 

a> Data Memory : The local data menK^ry can be accessed in every clock cycle. It is 
generally used for loading weights, coefficients , which are not transmitted during 
the execution of an algorithm. Intermediate results can also be stored. 

b> Register file ; I ADSP-3i28 <128 X 16)3 (fig. 3.8) 

It is a versatile data storage component which greatly expands the 
computational bancb*iidth of a fast arithmetic processor. The ADSP- 3128 also 
simplifies processor design by permitting flexible data routing through its five 
16-bit ports : two input ports , two output ports and a bidirectional port. Two 
register files are teed "horizontally" yielding 128 words of 32-bit storage. The 
five ports allow six 32-bit transfer operations per cycle for single precision 
mode using two t^ps. 

Register to register transfers are made via the bidirectional E data port, 
kk'ites to the RAM occur in clock HI. Note that data ••tf'itten in clock HI is available 
to be read in the same clock cycle. 

The arrangement of the register file ports is as sho»)rt in the fig 3.8. Ports 
D and C provides data to the ALU and multiplier chips respectively. Port E is 
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bidirecUcmal and can be used to receive input from the cross bar , scratch pad or 
from the ALU output. It can act as output for writing into the scratchpad , into the 
ALU port , or irwsut to crossbar through the bidirectional buffer. The input port A 
accepts the multiplier output. 

Thus the register file can be used for storing the intermediate results and 
to route the data/results to/from the various units. 

c> Scratchpad Memcrg : The scratchpad memory can be used to hold scalars , 
floating point constants and small arrays. The addition of the scratchpad 
increases memory bandwidth and improves through-put for those programs 
operating mainly on local data. 

Addresses to the data and scratci^ad memory come from the address 
crossbar , i^rfiereas those for register file come from control bits of the 
microcode. 

325 Cowxjtational Uhits : IFig. 3B1 

The cell is using a floating point multiplier ADSP-32iO and a floating point 
ALU ADSP-3220 confirming to IEEE standard 754. 32-bit twos-complement fixed-point 
operations are also s< 4 }ported by these chips. Both these units have two - stage 
piU>elining. 

The ADS=^-3210/3220 share a common architecture [Fig. 3.51. The input 
registers can be read to the chip's computational circuitry as they are loacfed. At 
the end of first processing clock cycle, partial results are clocked into a set of 
internal pipeline registers. At the end of second processing cv^le, results are 
clocked into an output register. The contents of the output register can then be 


driven off chip. 




Because all input and output data are internally registered and because of 
the single level of internal pipeline registers, operations can be overlapped for 
high levels of pipelined throughput. The table illustrates a typical sequence of 
pipelined throughput. 


Tim® 

(cycles) 

Load input data 

first stage 

Second stage 

Output result 

1 

Data set A 




2 

Data set B 

Data set A 



3 

Data set C 

Data set B 

Data set A 


4 

Data set D 

Data set C 

Data set B 

Data set A 


Single - (w-^ision floating point data format is as follows. 


si€MD 

exponent <e) 

f ractiwi <f ) 

s 

mm 

. f22 fO 






























The following nr»emonics indicate ttroses of floating point numbers in the 


computations. 


Mnemonic 

Exponent 

Fraction 

Value 

Name 

NAN 

255 

non-zero 

undefined 

not - a -number 

INF 

255 

zero 

(-i>®tinfinity) 

infinity 

hORM 

1 - 254 

any 

(-l)®(i.f)2®"^®^ 

normal 

DNRM 

0 

non-zero 

<-i)®(0.f)2'^“ 

denormal 

ZERO 

0 

zero 

0 

zero 


STATUS FLAGS : These chips generate on dedicated pins the following exception 
flags specified in the IEEE standard : Overflow (OVRFLO), Underflow (UNDFLO), and 
Invalid operation QNVALCF). 

Conditions that cause the assertion of INVLOP are : 

- NAN input to computational circuitry. 

- Multiplication of either +/- IhF by either +/- ZERO. 

For comparison operations in the ALU , the OVWLO , LWDFLO, and INVALOP status 
outputs are used to indicate four comparison conditions. 

-"Less than" is signaled by the assertion of UNIff’LO (while OVRFLO is low) 
-"Greater than" is signaled by the assertion of OVRFLO (while LWDFLO is low) 
-"Equal" is signaled by not asserting either UNDFLO or OVRFLO. 

-"Unordered" is signaled by the assertion of INVLOP caused by attempting a 
comparison with at least one NAN operand. 

Instructions And Operations : The AI^P-32iO multiplier executes the same 

instruction every cycle : multiply. It need not be specified explicitly in the 
microcode. 

The ADSP-3220 ALU , in contrast to the multiplier is instriKJtion driven with 


the operation specified by 18-0. 
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Since only two registers and BO of both the chips are used , only two 
signals are required ( Aen* and Ben* ) to load the input registers. This is done to 
sin«>lify the microcode, oen* signal is used to enable the output port. For the 
multiplier, the microcode signal msp*/mfixed* specifies either a single precision 
floating point multiplication or a fixed point multiplication of the ti"K) register 
ir^Duts A and B. For the ALU the instruction field is used to specify these options. 

The detailed description of multiplier and ALU (ADSP-32i0/3220) chips is 
given in tte Appendix D. 



4. SIMULATOR 


4.1 INTRODUCTION . 


Simulation is the process by which understanding of the behaviour of an 
already existent or planned physical system is obtained by observing the 
behaviour of the model representing the system. A simulation study must have a 
purpose and there are many reasons why simulation is valuable. For example , 
simulation may be performed to check and optimize the design of a system before 
its construction , thus helping to avoid costly design errors and ensuring safe 
designs. Other purpose include analysis. Performance evaluation, cost 
effectiveness, forecasting , safety, teaching , decision making. 

Simulation is tNjs a very widely used technique. However , simulation of a 
con«Dlex processor or a complex computer system as a tool for software 
development is of recent origin. Tt® main uses of such a simulator are ease of 
software development , debugging, testing , modifications and evaluating the 
performance of the system. The main aim of the simulator described in this chapter 
is to provide a facility to develop signal processing algorithms on the SASP 
system and evaluation of the performance of the SASP system. The user has almost 
all the freeckjm for software developci®nt as if he is working with a hardware 
system. The simulator program uses the object cocte assembled by the meta- 
assentoler. It is not a real time simulator. SiiiKjlation is cycle by cycle and one 
cv^le is equivalent to one clock cycle in the harcfcMare system. In the hardware one 
ctKcle is equal to 10 microsectMTds for a 10 hfriz clock. 



SIMULATOR BLOCKS: 


The main building blocks of the simulator are the simulation of the various 
components of SASP. The following chips have been simulated. 
i> Sequencer ADSP-i40i 
ii> Address generator ADSP-1410 
iii> ALU ADSP-3220 
iv> Multiplier ADSP-32iO. 

In addition a FIFO queue has also been simulated as it is extensively used in the 
SASP architecture. Each of these blocks is explained below. 

4.2.1 Simulation of the Secmencer CADSP-140i3: 

The details of the chip are given in the Appendix D. All the features of the 
chip are simulated. Each of the sequencer instructions is simulated using a 
separate routine. And the registers and signals of the chip are expressed in the 
form of global variables. The simulation process is as follows. 

In an instruction cycle, first the subroutine for the instruction 
corresponding to the opcode presented to the sequencer is executed. Then the 
program checks for an intrrupt request. If a request is pending the program 
cointer is loaded with the interrupt vector. At the eral , the program checks the 
interrupt pins and internal inta'rupt sources for an interrupt. If any of these 
sources is active a recpjest is raised. A flowchart is given in figure 4.1. 

The ADSP-140i processes eight external and two internal interrusts. The 
two internal interrusts are reserved for stack overflow - IRS and counter) 
underflow - IRO <See Appendix D). The simulator has internal cycle counters, which 
can be set to model interrusting devices. User can activate any one of the 8 




Fig. 4.1 Simulation of Mie Sec|L»x%r ADSP-1401. 






external interrupting sources <1R1-1R8) by specifying time interval in cycles 
between two consecutive interrupts. When the interval time expires , an interrupt 
is issued at the corresponding level. To disable the interrupt , the interrupt 
period should be set to zero. 


The other major features of the sequencer simulation are, 
i> Internal 64-word RAM implementing two distinct stacks: a subroutine 
stack and a register stack. When stack overflow is detected , interrupt 1R9 is 
raised. 

2> Four independent i6-bit counters are used for maintaining loops and 
event tracking. 

3> When the sign bit of status register is set IRQ interrupt is raised. 

422 . Simulation of Address generator EADSP-i4103: 


The simulation process is similar to the simulation of ADSP-140i sequencer. 
Here also in every cycle the subroutine for the instruction corresponding to the 
opcode for the address generator (in the microcode) is executed. The instruction 
set of the adck^ss generator is given in appendix C. 

423 Simulation of ALU and multiplier chips(ADSP-3Z20 and ADSP-SZiXB: 


The two stage pipelining is simulated in the following way. There are two 
registers A and B for both ALU and multiplier. Let us define an array 'result' of 
dimension 2. Now the following table illustrates the tt^sical secpaence of pipelined 


cwserations. 



Time 

(cycles) 

Load input data 

in A and B Reg. 

first stage 

resultO= 

jperation A and B 

Second stage 

resultl=resultO 

Output results 

resultl 

i 

Data set P 




2 

Data set Q 

Data set P 



3 

Data set R 

Data set Q 

Data set P 


4 

Data set S 

Data set R 

Data set Q 

Data set P 


Here also, for each instruction of ALU chip, one subroutine is executed , in the 
sintulator. 

42.4 Simulation of a queue structure (FFO CY 7C42C»: 

The logic of the simulation of a queue structure is similar to the logic used 
in the hardware implementation of YB memory, described in section 3.1.3. 

The size of the queue is fixed by the value given in the architectiff'e 
description file. The status of the queue is given by two flags, Qfull, Qenvsty. 
Again the variable 'TC' is used, which is set when the terminal count fw' write 
cocnter is reached and is reset to zero when the terminal count for read counter 
is reached. The flowcharts for the 'Read Process' and the 'Write Process’ are 
given in figure 4.2 and figure 4.3 respectively.. 

43 SIMULATION PRQCEmJRE: 

The simulator simulates the SASP system. An architecture description file is 
input to the simulator to start a simulator session. By reading the archuteclLB'e 
ttescription file , it configures itself to match the target system hardware. In the 
architecture file the user can specify the size of X queue , Y queue , Addr queue 





Fig. 42 FlowdTart for reading a Queue. 







data memory , program memory for a cell and X, YA , YB memory sizes for the IFU. 
Apart from these, the user can specify the array size , i.e. number of cells in the 
array. 

The simulator program first reads an architecture description file and 
compact definition files for the microcodes of the IFU and a cell. After 
configuring itself, it prompts the user for a command. The flow of the simulator 
main program is as shown in figure 4.4. 

The simulator contains a 'Command Table'. A binary search of the 'command 
table' is made to locate the command. If a valid command is found, then the 
corresponding subroutine is executed to serve that command. 

Modularity of the program enables division of the problem into smaller 
tasks. After completing the desired task, the simulator returns to the simulator 
pronwjt mode until a logical termination of the sifnulator is asked by the user by 
giving the 'EXIT' command. The details of tt« commands are given in the appendix B. 

The steps followed by the 'run' command subroutine are given in figure 4.5. 
In an instruction cycle, first some of the instructions from the microinstruction 
of IFU are executed. Then microinstructions for the array are executed starting 
from cell N to cell 1. At the end of the cycle remaining instructions for the IFU 
are executed. This sequence is followed to simulate parallel execution of all 
microinstructions for cells and the IFU. 

For the single step routine , iJhs first 5 (a to e) steps are followed once 
and then ttw simulator returns to the prompt. 

The sttosteps followed by the first 3 steps are given in figures 4.6 , 4.7 and 


4.8. 



Read architecture description -Pile and 
cksfinition files for the microcodes of 
the IFU and the cell unit. 









Is a halt condition present^ 


Return To the 'SHi > ' pr«npt. 


Fig. 4.5 Steps folloMCHi by 'nan' command subroutine. 













Fig. 4.6 Flowchart for step 'a' in fig. 4.5. 








Check for the skipping of cell clock cucle~ 
i> If 'read X qi^ue', check for Xc^mpty flag. 
ii> If 'read Y queue', check for Yqempiy flag. 
iii> If 'read Adck' queue', check for Addrc^mpty f ,ag. 
iv> If 'write Addr queue', check for Addrqfull flag. 
v> If 'write X queue', check for Xqfull flag. 
vi> If 'write Y queue', dirmck for Yqfull flag. 


Is cycle skip 'TFHJE' ? 



Fig. 4.7 


Continued. 











"ig. 4.7 


Continued. 
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Fig. 4.8 Continued. 
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Fig. 4.8 Flowchart for step 'c' in fig. 4.5. 




4 4 SPECIAL FEATURES: 


i> During program development , the concept of modular programming has been 
strictly followed. Each instruction of a device <e.g. sequencer) is simulated by a 
separate subroutine. 

2) Adequate documentation is provided in the source program to explain the 
working of the simulator in general and constituent subroutines in particular. 

3) Addition of new instruction for a device is not restricted by the simulator. For 
addition of a new instruction , its name and opcode should be added in the 
structure for the list of instructions and a subroutine is to be added to execute 
the instruction. 

4> Since each of the simulator block is simulated using separate subroutines and 
structures of variables , one can simulate some other architecture using these 
basic blocks, without any modifications in the software. 

5> Total number of clock cycles required for the execution of a program is 
calculated to give the user an idea of actual execution time. This feature is 
useful for evaluating the performance of the system. 

6> Error messages are flasl^d on the terminal. 

7> Usual debugging facility is provided , for the development of the user programs. 
The facilities are 

i> Load and display facilities. 
ii> Trace facility. 
iii> Break point facility. 
iv> Execution facility, etc. 

These facilities are described in the Simulator manual given in the Appendix B. 



4S tK3DEL SESSION : 


A model session for the matrix multiplication program is given. The 
algorithm of the program is described in the chapter 5. 


Script started on Thu Jan 25 17:45:48 1390 
$ ssim 

_Arch_file: syst.arh 
SIH> 

SIM> Icode smulO.asm 

ERROR :- Object file is not a valid file 

SIM> Icode smulO.obj 

SIM> lx -f 
.Address: 0 
.Length: 6 

1 2 3 4 5 6 

S1M> 

SIH> element 

Current Command cell: 0 

.Number: 1 

SIM> Icode 

Obj file: smull.obj 

SIM> 1dm -f 
.Address: 0 
.Length: 2 

2 2 

SIM) 

SIM) element 

Current Command cell: l 

.Number: 2 

SIM) Icode 

Obj file; smul2.obj 

SIM) 1dm -f 
.Address: 0 
.Length: 2 
2 
2 



SIM> 

SIM> element 

Current Command cell: 2 
_Number: 3 

SIM> Icode 

Ofaj file: smul3.obj 

SIM> 1dm -f 
_ Address: 0 
_Length: 

2 2 

SIM) 

SIM> run 

Program terminated. 

SIM> ybi -f 
rdcntrl=0 
wrcntrisS 
_ Address: 0 
OOOOh :6.000000e+00 
OOOih :i.400000e+0i 
OOOZh :2.200000e+0i 
0003b .-S-OODODOe+OO 
00Q4h :1 .4000006+01 
0005h :2.200000e+0i 
0006h .'S.OOOOOOe+Cm 
0007h :i.400000e+01 
0008h :2200000e+0i 
OOOSh :0.000000e+00 

SIM) exit 
_ Verify: y 
$ 

script done on Thu Jan 25 17 •53:21 1390 
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5. TESTING OF THE SIMULATOR 


5.1 ItfTROmJCTION: 

To test the simulator , two algorithms , for matrix multiplication and 
convolution are developed. The number of cells in the array has been fixed in the 
examples ( equal to 3 ) . However this number is variable, and can be changed by 
changing value in the architecture description file. Both the algorithms are 
explained and the microprograms are also given. The microprograms are assembled 
by the meta-assembler described in chapter 6 . The simulator loads the object files 
and the data , before the start of the program execution. 

52 MATRIX MULTIPLICATION ON THE SASP ■ fFig. 5.13 

Let us consider two matrices A and B of size M x N and N x K 
respectively. Then their product C is a matrix of size M x K . 

Matrix A is stored in the X memory of IFU row by row. Matrix B is stored in 
the data memory of the cells , one column in each cell. Therefore an array of K 
cells is required for this operation. The data is stored at the start in each cell 
arffil the IFU as shown in figure 5.1. 

The IFU sends the X memory data on to the X channel sequentially. A cell n 
calculates a column of the result matrix using this X data on the X channel and 
sends the results to the cell n-1 on to Y channel using the reverse direction of 
the Y bus . At the end of the calculations, the cell sends the results written into 
its queue by the cell n+i , to the cell n-1. The first cell in the array , effectively 
saxte the results to the YB memory of the IFU. 



IFU 


CELL i 


CELL 2 


CELL 3 



CELL i SPECIFICATIONS 


xin 

‘^li 

xcxjt 

V 


b2i 

y 

yin 

^Ni 

yout 




jsl ; mal 

■*xout = xin = Xm 
Yj = + xm * 

_ m » m+l , li mi N 
m*l ; youis Yj ; j = j+i ii ji M 
k=l 

youi s yin 

k= k+i , ii ki (K-i)*M 


Fig. 5.1 MATFQX MULTIPLICATION ALGORITHM. 






Thus the first element of the matrix C calculated by cell 1, is 


°ii = *12**=^! + + ®iN*'=*Ni 

The reverse direction of Y bus is used, because the computations by cell 1 are 
over before the cell 3, and to make use of further cycles of the cell i for 
transferring of results to the YB memory, this feature is necessary . 

The microprograms for the IFU and the cell 1 are given below. For other 
cells the program is same except, i> the constant, 'cell_no' and 2> the last micro 
instruction at address 'end i'. The constant, 'cell_no' is the serial number of a 
cell in the array. The last microinstruction for all other cells is 'jpcnf' instead 
of 'cont' for the cell 1. This instruction keeps the cell waiting, till the signal 
'program terminated' comes from the cell 1. The model session for this program is 
shown in section 4.5. 


f'rogram of Matrix multiplication for the IFU. 

M equ 20 
N equ 20 

wrcntr(CO) & 2 8000h+M*N/2-2 & ken & clrx & celln_l 8e rdx 
loop: dccntr(C0> & rdx & wrxq & celln_i 

jda(sign) loop & ken & wrxq & rdx & celln_l 

^nf & celln_l 


;TH1S PROGRAM IS FOR MATRIX MULTIPLICATION ON THE SASP. 

IS TO BE LOADED ON EACH CELL. NO. OF CELLS EQUAL TO K. 
; A_matrix M*N 
; B_matrix N*K 
N equ 20 

M equ 20 

K equ 3 

cell_no equ i 

wrcntKCl) & 2 8CHD0h+M-2 & ken & xbiout_constf & rst 
LOOPl: wrcntKCO) & 2 8000h+N-3 & ken & xbiout_constf 
yrtr(RO) & dsel & 2 0 & ken & xbiout_conslf 



2 0 & ken & xbain_const'f & res—swr & e_addr(#20h) & rdxQ & xbbport_xi & 
b_addr(#0) & reg_bwr & c_addr(#0) & reg_crd & mul_aen 
wrxq & yinc(CO)(RO) & rddm & xbbport_dmout & b_addr(#l) & reg_bwr & 
c_addr(#i) & reg_crd & mul_ben & e_addr(#20h) & reg_erd & 
alu_ben & d_addr<#20h) & reg_drd & alu_aen 
LOC3P2: dccntr(CO) & rdxq & xbbporl_xi & b_addr(#0) & reg_bwr & c_addr(#0> & 
reg_crd & mul_aen & sadd 

wrxq & yincCCOXRO) & rddm & b_addr<#i) & xbbport_dmout & reg_bwr & 
c_addr<#i) & reg_crd & mul_ben 

jda(sign) LOOP2 & ken & xbiout_constf & moen & aoen & aout_to_in & 
aluout_bufen & alu_ben & d_addrC#20h) & reg_drd & alu_aen & 
a_addr<#20h) & reg_awr 
cont & sadd 
coni 

moen & aoen & aout_to_in & aluoul_bufen & alu_ben & d_ack3lr(#20h) & 
reg_drd & alu_aen & a_addr(#20h) & reg_awr 
sadd 
cont 

dccntr(Cl> & aoen & xbyout_aluspout 
jdafeign) LOOPl & ken & xbiout_constf & wryq 
wrcntr(Ci) & 2 K*M-cell_nofM-i & ken & xbiout_constf 
LCXIF3: dccntr(Cl) 

jdafeign) end! & ken & xbiout_constf 
rdyq & xbyout_yi 

jda<uncondiiional> LOOPS & ken & xbioul_constf & wryq 
endl: cont 


Perfonnance results: 

In the sample matrix multiplication of A(3x2) and B(2x3), total number of 
clock cycles required to execute the program is equal to 70. The total number of 
floating point multiplications are 9*2 = 18 and 9 floating point additions. Since 
the matrices are small, the most of the cycles are wasted in initial and final skews 
in the operation of the cells and finally for transferring the results to the IFU. 
Therefore the computation power of the SASP is not fully utilized by the 
multiplication of smaller matrices. 

In another example, for the multiplication of matrix A(20x20) and B(20x3), the 
simulator takes 1525 clock cycles to execute the program. Here it does 
(20*31*20=1200 floating point multiplications and <20*3)*19=1140 floating point 
acklitions. Thus for a 10 clock rale 15 Mflops rate is achieved using 3 cells. 



5 The ^som/QkrtiQf) o£ two se€fUGfiC6s ^ 

Let us consider the convolution of two sequences x(n) and h(n), 


00 

Y(n) = X x(k)*h<n-lc) 
k=-oo 


Let 


x(n) = 0, n<0 

h(n) = 0, n<0 


then 


Y(n) =£;x(k)*h<n-k) 
k=o 


0^ n < N 


To evaluate the output sequence, the input sequence x(n) is kept in the X 
memory and weights h(n) are kept in the cell array as shown in figure 5.2. The 
initial results are kept in the YA memory. In this case these are all zeros. 

To execute this algorithm the interface unit reads initial results from the 
YA memory and the input data from the X memory and writes these data into the Y 
queue and the X queue of the cell i , respectively. A cell n reads the X data from 
its X queue and multiplies it with the first weight from its data memory. This 
multiplication result is added to the partial result from the Y queue (written by 
ncil n-i). The result of the addition is written to the Y queue of the cell n+1. 
The last cell (cell N) writes the partial results into the YB memory. The IFU again 
reads the partial results from the Yb memory and also starts reading the X data 
from the first location in the X memory (By clearing the X counter ) and writes the 
read data into the Y queue and X queue of cell i respectively. In this second 
phase the cell n uses next (second) weight from the data n»mory to multiply the X 
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To FU 


The cell specifications 


xin 

- - ■ 

h 

xout 




- . . . 
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Y 
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xout = xin 
Y = yin + h * xin 
yout = Y 


Fig. 52 Convolution algorithm. 





data. This continues till the last weight in the cell array is used. Thus at the end 
, -final results are stored in the YB metnory. 

The microprograms for the IFU and cell i are given below. The last 
instruction for the last cell ( Here it is cell 3) should be 'cont' instead of 'jpcnf' 
to indicate 'program termination' to other cells. 


; PROGRAM FOR THE IFU , TO CONVOLVE TI40 SEQUENCES. 

N equ 10 

no_of_coeff equ 2 ;Number of weights stored per cell. 
cell_N equ 3 

wrcntr(cO) & 2 8000h+N-3 & ken & clrx & clryb & clrya 
rdx & yrtr(r0) & dsel & 2 8000h-2 & ken 
loop: dccntr(cO) & wrxq & rdya 

jda<sign) loop & ken & rdx & wryq 
wrxq 

wrcntr(ci) & 2 no_of_coeff-2 & ken & rst 
loop!; dccntrtei) & yinoCcOKrO) 
jda(sign) end! & ken 
rtcKrO) & dstb 
wrcntr(c2) & den 

nextl: wrcntrteO) & 2 8000h+cell_N-2 & ken 
next; dccntrtoO) & rdyb 

jdaCsign) next & ken & wryq 
dccntr(c2) 

ida(sign> nextl & ken 

wrcntrteO) & 2 8000h+N-3 & ken & clrx 

rdx 

loop2: dccntrteO) & wrxq & rcM^ 

jdaCsign) loop2 & ken & rdx & wryq 
wrxq 

idatenconditional) loopl & ken 
endl: jpcnf 


Since a cell has two weights (coefficients) in its data memory , the X input 
data has to be routed twice through the array. In the first routing , first three 
weights from first location of data memories of the three cells are used in the 
convolution process and the partial results are written into the YB memory. In the 



second routing the next three weights are used and final results are written into 


the YB memory. Thus same cell is used twice , reducing the requirement of more 
number of cells. 


;PROGRAM FOR A CELL to convolve floatiing point sequences. Use of feedback 

method to simulate multiple cells using one cell. 

cell_no equ i 

N equ 10 

no_of_coeff equ 2 

cell_N equ 3 

wrcntrtoi) & rst & 2 no_of_coeff-2 & ken & xbiout_constf 
yrtr<r0) & dsel & 2 0 & ken & xbiout_constf 
yrtrCrl) & dsel & 2 8000h-2 & ken & xbiout_constf 
start: wrcntr(cO) & 2 cell_no-2 & ken & xbiout_constf 
again: dccntrtoO) 

jda(sign) next & ken & xbiout_constf 
rdyq & xbyout_yi 

jdr(unconditional) again & ken & xbiout_constf & wryq 
next: wrcntKcO) & 2 8000h+N/2-5 & ken & xbiout_constf & rdxq & 

xbbport_xi & b_addr(#0> & reg_bwr & c_addr(#0) & reg_crd & mul_aen 
yrtr(rO) & wrxq & rddm & b_addr(#l) & xbbport_dmout & reg_bwr & 
c_addr(#l) & reg_crd & mul_ben 

rdxq & xbbport_xi & b_addr(#0) & reg_bwr & c_addr(#0) & reg_crd & 
mul_aen 

wrxq & yrtrCrO) & rddm & b_addr(#l) & xbbport_dmout & 
reg_bwr & c_addr(#i) & reg_crd & mul_ben 
rdyq & xbain_yi & alu_ben & moen & a_addr(#3) & reg_awr Se 
d_addr(#3) & reg_drd & alu_aen & rdxq & xbbport_xi & 
b_addr(#0) & reg_bwr & c_addr(#0) & reg_crd & mul_aen 
sadd & wrxq & tff'tKrO) Sc rddm & b_addr(#i) & xbbport_dmout & 
reg.bi^B' & c_addr(#l) & reg_crd & mul_ben 
rdyq & xbain_yi & alu_ben & moen & a_addr<#3> & 

reg_awr Sc d_addr(#3) & reg_drd & alu_aen & rdxq & xbbport_xi & 
b_addr(#0) & reg_bwr & c_addr<#0) & reg_crd & mul_aen 
aoen & xbyout_aluspout & sadd & wrxq & yrtr(r0) & rdchn & 
b_addr(#i) & xbbport_dmout & reg_t»«^ Sc c_addr(#i) & 
reg_crd & mul_ben 

loop: wryq & rdyq Sc xbain_yi & alu_ben & moen Sc a_addr(#3) & reg_awr Sc 

d_addK#3) Sc reg_drd & alu_aen & rdxq & xbbport_xi & 

b_addr(#0) & reg_bwr Sc c_addr(#0) & reg_crd & mul_aen 
sadd Sc ic^xQ Sc yrtKrO) Sc rddm & b_addr(#i) & xbbport_draout Sc 

reg_bwr Sc c_addr<#i> & reg_crd & mul_ben Sc aoen & xbi^t_aluspout 
dccntrKcO) & wryq Sc rdyq & xbain_yi & alu_ben & moen Sc a_addr(#3> & 
reg_awr & d_addr(#3) Sc reg_drd & alu_aen & rdxq & xbbport_xi Sc 
b_addr(#0) Sc reg_bwr & c_addr(#0) & reg_crd & mul_aen 
jda(sifir») loop Sc ken & xbiout_constf & aoen Sc xbyout_alus(M3ut & 
sadd Sc wrxq & yrtr(rO) Sc rddm Sc b.adcb'C#!) Sc xbb|K)rt_dmout & 
reg_bwr & c_addr(#l) & reg_crd & mul_ben 



wrycj & rdyci & xbain_yi & alu_faen & moen & a_acldr(#3) & reg_ai*«' & 
d_addr(#3> & reg_drd & alu_aen & rdxq & xbbport_xi & 
b_addr(#0) & reg_bwr & c_addr(#0) & reg_crd & mul_aen 
sadd & wrxQ & yrtr(rO) & rddm & b_addr(#l> & xbbport_dmout & 

reg_bwr & c_addr(#i) & reg_crd & mul_ben & aoen & xbyoul_aluspout 
wryq & rdyq & xbain_yi & alu_ben & raoen & a_addr(#3) & 

reg_awr & d_addr(#3) & reg_drd & alu_aen & rdxq & xbbport.xi & 
b_addr(#0) & reg_bt«' & c_addr(#0) & reg_crd & mul_aen 
aoen & xbyoul_aluspout & sadd & wrxq & yrtr(rO) & rddm & b_addr(#i) & 
xbbporl_dmout & reg_bwr & c_addr(#i) & reg_crd & mul_ben 
wryq & rdyq & xbain_yi & alu_ben & moen & a_addr(#3> & reg_awr & 
d_addr(#3) & reg_drd & alu_aen 
sadd Se aoen & xbyout_aluspout 
wryq 

aoen & xbyout_aluspout 

wryq & moen & xbyout_mresult 

dccntr(cl) & wryq & yinc(ci)(ri) 

jda(sign) end2 & ken & xbioul_constf 

rtd(ri) & dstb 

wrcntr(c2) & den 

nextl: wrcntrteO) & 2 8000h+cell_N-2 & ken & xbiout_constf 
againl: dccnlr(cO) & rdyq & xbyout_yi 

jda<sign) againl & ken & xbiout_constf & wryq 
dccntr(c2) 

}da(sign) nextl & ken & xbiout_const-f 

jda(unconditlonal) start & ken & xbiout_constf & yincCcOKrO) 
end2 : jpcnf 


To see the results the YB memory should be read using 'rdyb' simulator 
command. 

The table 5.1 depicts the operations done by the IFU and cell array on the 
various data items during the execution of the convolution algorithm. A multiplier 
in a cell uses two registers 'mul_A' and 'mul_B' and an ALU uses two registers 
'alu_A' and 'alu_B'. Result of multiplication is stored in 'iw'esult' and output of ALU 
is stored in 'aresult'. 


Performance results : 

1> For a system with 3 cell) each having 2 weights (Total S weights, h(n) ) and with 
100 input data ( x(n) , N =100) , the SASP simulator takes 465 clock cycles to 
execute above convolution algorithm. In this ccwTvolution , ntanber of floating point 
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WCli t 

If 

wait 
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multiplications is approximately equal to 100*6=600 arxl number of floating point 
additions is equal to 100*5=500. Thus for a clock rate of 10 Mhz , 24 Mflops 
computation rate is achieved. 

2> If instead of two weights only one weight per cell is used , then the total 
floating point operations are 100*3+100*2=500 . The simulator takes 240 clock 
cycles to execute this program. This gives 21 Mflops computation rate. 

5.4 CONCLUSIONS: 

The simulator is a useful tool to develop the programs and to evaluate the 
performance of the SASP array using different array sizes. The programs 
developed give the expected results. It shows that the simulator works perfectly 


and the facilities provided are adequate. 



e. THE MET A- ASSEMBLER 


€D INTRODUCTION £ 

A two_phase meia-assembler is enveloped . The software is designed to 
assemble microprograms for several different microprograiwnable fM'ocessor 
architectures . 

A conventional assembler is a dedicated piece of software because , it only 
recognizes the symbols, which define the instructions of one particular machine 
and word size. The main feature of microassemblers which distinguishes them from 
other assemblers is the redefinable multiple-field format of the object code. 

A microprogram segment must specify many things: sequence of the 
microprogram control flow, control codes for ALU and other chips like, adek'ess 
generator, register addresses, timings and enabling conditions for latches and 
switches, constants of a very few to many bits in lengths for cowariscwi, 
preloading or masking. The control cocte groups may be bit patterns for direct 
control of gates, or they may be encoded furK^tions. 

The typical microinstruction, then, is a bit pattern of several fields; each 
of the fields may have different lengths in bits. For specifying tt® content of 
each field an assembly language recMires multiple assi^Tmenls. This 
microassenriDler allows the assignments of all the field bit patterns as a group. 

The typical line of a microfW'ogram therefta^e, differs from a conventional 
word computer instruction line in that, it has iwjltiple "c«3c«ies". Furthermore, 
different opcode patterns mig^it call for different field growings in successive 



microinstructions. A jump instruction, for example, might call for only two fields ; 
the field specifying the juiw function and a long field giving a jump address within 
the microprogram memory. On the other hand , an ALU operation might require 
several short fields containing codes giving the first and second operaraf source 
locations within the register set, the codes for the ALU operation performed, the 
destination of the result , and bits to control the harKJling of carries and 
condition codes. 

The microassembler developed in this thesis can be redefined to translate 
the source code for many different machines. Because of this redef inability this 
assembler is also called as meta-assembler. This is achieved by splitting the 
assembly into two distinct phases. 


A) DEFINITION PHASE 
Instruction 


length > 

Fields > 

opcodes > 

defaults ^ 


B> ASSEMBLY PHASE 


Load 

addresses ^ 

Labels > 

Instructions >■ 

Constants ^ 


^ Mnemonics 


Microinstruction Fcw^nats 
t <oompact definition file ) 
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Asseibly Listing 




Microprogram ob^ct code 


Fig. 6.1 Phases of the meta-assembler. 




6.i 


DEFINTTIOi PHASE: 


The first phase or definition phase processes a file containing a definition 
of the field format of an instruction word in the target machine. The word is 
defined by its name, position in the word, width, default value, a set of mnemonic 
tags to be associated with opcodes and association of these tags with appropriate 
formats and field values, combination of values which are not legal, brarKJh 
instruction i^ich need branch address to be given in the data field and opcode bit 
notations and associated stpnbols used. This information is translated into an 
intermediate format, and stored in a file. 

62 ASSEMBLY PHASE: 


The second, or assembly phase is more like assembly language processing of 
a fixed-architecture computer. The basic task is to scan lines of input si^itols, 
translating mnemonic codes , expressions, and controls into sec^iences of binary 
microinstructions. Fields specified in the definition phase are pulled together, 
aligned, and stuffed into a word called microinstruction. TNjs output of this phase 
is the microprogram object code. 

63 ASSEMBLY UNE: 

A typical assembly line should start with an alphantmeric label arxi be 
followed by an instruction-flow opcode {sequencer instruction). Blartes and late are 
allowed as separators among the items in the lines. Because of the generally 
conplex structure of the microinstruction format, a variable number of items 



•follCM. One microinstruction is allowed to spill over several lines. The assen±>ler 
allows for missing opcoctes and values when a programmer desires a default 
quantity to be inserted in the corresponding field. Comments are allowed , starting 
with a semicolon, arwi ending with new line character. 


6.4 IMPLEMENTATION : 

Both phases of the assembler are implemented in C language. The original 
goal of the developing this software was to provide software suwsort for the 
linear systolic array processor, which is under cteveloiwnent. This has been 
achieved. The only short coming of this meta-assembler is the abser%e of 
macroprocessing facility, which can be added. 

The two phase assembly has one more advantage, that is th« intermediate 
(compact) definition file can be used by a general ptB'pose s^m^x^lic debugs^ to 
provide symbol definitions for data entry and display while debugging. 

The flowchart of the program for assenA>lf|ii processing phase (phase 2) is as 


shown in fig. 6.2. 



Fi^- 6-la Co«fmu<c». 
























PassZ Start 


Error_count = 0 ; program_loc_counter=:0 
Data_loc_counter=0 ; line_no =0; 


Get line 
line_no ++; 


EOF ? 


YES 

— >)> — <^Error_count=0'’E>-r-?s 


Comment ? 


Consider next 
string 


Split line into stringt 
separated by 


Split a string into labelj 
wordl, wordZ and word3 


Give 

Message 


STOP 


S»rdi= NUQ>^ YES 
or 


'word! or wordZ YES 
a directive ? — i 


NO 

'ts it a code_line’?>— >■ 


Split word 1 into 
mnemonics and 
operands, separated 
by brackets 


Last chara. on 
■>tbe line *= 


program_loo_counter++, 
write micnoword in objf 


Fig. 6.ZfcContinued. 














Fig. S^tFlowcsharl for PASS2 of the ASSEMBLY PHASE 





7. CONCLUSIONS 


CONCLUSIONS: 

A systolic array signal processor (SASP) has been described. It is a linear 
array of microprogrammed cell units connected to an external host through an 
interface unit. 

A simulator for the SASP is developed, which can be used to develop and 
debug the programs for the SASP system. The simulator is a useful tool for 
evaluating the performance of the system. It cff'ovides a tcK)l for testing programs 
to be run on the SASP hardware. It also provides the facility of rurwiing the 
program under user control and estimates the run time in number of cycles. 


A meta-assembler is developed, which is used for assembling the programs 
for the SAS=* system. The meta-assembler is generalized and redefinable, so that it 
can be used for any microprograrratied system or a microprocessor with fixed 
instruction opcode size. The meta-assembler has been thoroughly tested on sample 
Ctf'ograms. 

The programs developed for the SASP give the expected results. And it 
shc»MS that system simulator works perfectly and the facilities provided by the 
siimjlatCM' are adequate. The simulator has been tested using thM matrix 
multiplication arwl the convolution algorithms for few sample examples. 

SUGGESTIONS FOR THE FUTURE UORK . 


1> htore algorithms can be developed for a linear systolic array using the 
sumjlator. e.g. FFT , AR filtering , Matrix c*)erations like finding eigen values, etc. 



This will also help to evaluate the performance of the SASP system. 


2> The simulator blocks (sequencer , address generator and floating point 
units ) can be used to simulate other microprogrammed architectmes also. 


3> The modularity of the simulator program permits to modify it to match the 
fmther developments in the hardware design^ so that the simulator can be used 
for develc«}ing and debugging programs for the target system. 

4> The meta-asseni>ler is a very useful tool , which can be used for any 
microprogrammed architecture. An addition of MACRO facility to the meta-assendsler 
will make it more pcmerf ul and complete . 


5> An optimizing compiler can be developed , which will hide the low - level 
details of the system and allow the user to concentrate imne on the parallelism at 
the array level. The W2 language developed for the CMU Warp processor C^TnaS?! is 
an example of such software support. 
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APPENDIX A 


MANUAL FOR META-ASSEMBLER 

Afl mm mic imH i 

There are two phases of the assenrfaly. The first irfiase processes a file 
containing a definition of the format of an instruction word in the target machine. 
This information is translated into an intermediate format, and held in a file on 
backing store. 

The second or asserdsly phase reads the translated definition file from 
backing store and uses the st^nbols contained within it to process the 
programit«r's source code. The output from the second phase is a text file with one 
assembled microinstruction per line. The output is in a simple format coded in 
binary or hexadecimal. 

AA PHASE i - DEFINITION PROCESSING 


The definition file can be written in any standard text editor. The i45i:er 
ca^ and loi^r case letters are not distinguished. 

This fi^ase is best illustrated by a simple example. In the exanvtle reserved 
words appear in twper case to improve readability (not because it is a 
requirement) and a semicolon introduces a comment which is terminated by the end 


of the line. There are no layout restrictions. 



A ttefinilion file starts withj 


TITLE denwjnstration 

Tte TITLE directive has no effect on the translation and is included as an 
aid to quick program identification. If TITLE is omitted translation is aborted 
early on. Thus little time is wasted on a probably erroneous file. 

The output frcm the definition translator can be directed to a file with a 
particular name. The directive, 

DEFN_FILE "test.dat" 

causes the output to be sent to the file with the name within the quotation 
marks. This file then will be the input to the phase 2 translator. 

Let us consider an example of a system consists of a MAC, an adc^ess 
generator and a secMencer. The word is 45 bits wide and is divided into the fields 
as shcjwn in fig. A.i. There are total 8 fields. 

The next step is to give the word size of the target processor. Here it is 
45 bits. The input is 
WORD WIDTH 45 BITS 

The reserved words WIDTH and BITS are optional. 

The next step in the definition file is to give the program memory location 
width. Generally it is ecpjal to the word width. The input is 
PROGRAM_LOC_WIDTH 45 BITS 

BITS is optional. 

The next line should be 
DATA_LOC_WIDTH 16 BITS 


BITS is optional. This indicates the data memory location width. In this 



MSB 


LSB 



Fig. A.i Example system. 


















example, it is 16 bits. 


The fields within the microinstruction word are defined next. The first field 
in this example can be defined as, 

#sequencer ; FIELD ^44 , WIDTH 7 , DEFAULT cont 

The name of the field is 'sequencer'. It starts at bit 38 and ends at bit 44. 
The dash '-' tmtween 38 and 44 is essential and no space is allowed in between. The 
clause WIDTH 7 is useful, because, it is checked against the width calculated frcm 
the start and end bit positions of the field, which results in an error, if 
disagreement is detected. The DEFAULT clause is used to assign a default value to 
a field for which no value is specified in phese 2. (assembly phase). It should be 
given as DEFAULT mnemonic and no integer or binary value is accepted instead of 
mnemonic. This mnemonic should be present in th« VALUES clause, given at the end 
of a field declaration. 

Within a field or microinstruction, the bits are rtt-m±>ered as ascending order 
of significance from right to left. The lower-numbered bit has the least 
significance. 

The next clause in a field is 


OPCODE_BIT_NOTATIONS E 
kk ( 

;The values and symbols shwuld be written on the next line. 


00 

flag 

01 

carry 

10 

equal 

11 

nocondition 

) 


cc < 



c0-c3 

) 


1 


This clause is optional and can be omitted. Here kk and cc are bit notations. 



Thus this clause gives information about the bit notations, used in the opcodes, 
defined in the clause VALUES. It gives the bit notations and the values taken by 
them for different symbols. These symbols are then used in the operand field of 
an instruction in assen^ly phase. 

If a bit notation cc represents registers C0-C3, then it is not necessary to 
write every register's name and value for bit rotation, wst C0-C3 is sufficient. 
The only restricticm is that, tl^ bit rotation letters and register's name should 
start with same alphabet. 

In the bit notatitxi s«ne characters are not allowed, they are ')*, '(', 

'[', 'T and nundiers ‘Q'-’S’. All the characters in the bit notation must be same. The 
length of bit notation string should be less than 32 characters. The length of 
symbol used should not exceed 32 characters. I'kjmber of synijols used for a bit 
notation should not exceed 32. It is also limited by the length of the bit notation, 
e.g. For kk (length 2) maximum 2power2=4 stpnbols should be present, since there 
are only 4 combinations of bit notation values possible. Here, in above example, for 
the bit notation 'kk', there are 4 syidools 'carr!/, 'equal', 'nocondition' and 'flag'. The 
clause starts with 'E' and ends with 'T. Each bit notation's stflnbols and associated 
bit notation values are enrisedded in brackets '(' and y. 

The next clause in the field is 

INSTR_NOT_AVAILABLE C 

branch(signXcO> 

branch(sifir»Xcl) 

3 

This clause is also optional, in some mnemonics some operand 
teonditionXsombinations are illegal aral no opcocte is available for sijKsh 
instructions. Then such instructions (with operands) should be written in this 



, • are operands/cc»ndilions. If an instruction 

clause. The symbols in the bracket 

•„ fhP source code (phase 2> assembler gives an 
present in this clause appears in the 

error. The next clause in the field is 


BRANCH_lh<STR_UMCH_NEED_DATA I 

ABSOLUTE ( 
jda 

> 

RELATIVE ( 
jdr 

#*• 

) 

3 

, This is mainly useful for sequencer field Jn it, 

This clause is also optional, ini 

the for branch inalrucUcnb, «hich need *«. address to be In the daU 

field, are written eiU»ut their od-a-ds/conditions. b, phase 2, for these 

..djjrta to the label or expression written next to 
instructions the value corresponding 

. r -Pijaid The name of the data (constant) 

instruction is filled in the 'data' (constant) field, ine name 

field must be "data" and no other name is allowed. 

For mnemonics written under ABSOLUTE clause, in the phase 2 the absolute 

adc^ess c^responding to label (/expression) given in next wed (word2) to the 

a • aK 'HafA' field. And for the mnemonics, written under 

instructim, is put m the data 

w = 7 relative address corresponding to the 

RELATIVE clause in the phase 2 , 

label/expression is put in the 'data' field. 

The next clause in the field is 
VALUES [ 

idle OOiOOOO 
jda liikkii 

cont 0(X30(W0 


3 


In this clause, mnemonics 


and corresponding opcodes for the field ar 



written. The opcode Nidth should match with as given in the field width This 
clatse i* (30fflf)uls«"y, since every field has some default given in terms of a 
mnemonic. Therefore atleast one mnemonic should be cresent in this clause 


The oth«- fields for the given system are defined in a similar way. The 
ctefinition file contents look like as follows. 


TITLE TRIAL 

DEFN_FILE "TEST.DAT" 

WORD WIDTH 45 BITS 
DATA_LOC^WIDTH 16 
PROGRAM.LOC.WIDTH 45 
serial m. i 

#sequencer. FIELD :^-441JDTH 7 DEFAULT cont 
OPCODE_BIT„NOTAT!ONS I 
kk ( 


00 

unconditional 

01 

notflag 

10 

flag 

11 

sign 

) 


cc { 


C0-C3 


) 


ii ( 


10-13 



) 

3 

INSTR_NOT_AVA!LABLE I 
brancWsignKcO) 
branchCsignKcil 
branch<signXo2) 
branch<signXc3) 

3 

BRANCH_1NSTR_WHICH_NEED«DATA £ 
ABSOLUTE < 
isa 
jda 
) 

RELATIVE ( 
jsr 
jdr 
) 

3 

VALUES £ 

;jump & branch instruction 


jda 

3L .JL JL JL 

JdT' 

lilkkOl 

jsr 

lllkklO 



branch iOOkkcc 

ctecntr OliOOcc 

anair OllOliO 

,jfm*c«llan«ou» imtruction* ■■ 
cent OOOOCXX) 

idl» 0010000 


3 

;s«'ial no. 2 

#data: FIELD 22-37 J4IDTH 16,DEFAIA.T zaro 
VALUES £ 

z«ro raMOOOOOOOOOOOO 

3 

;serial no. 3 

#data.anabl«: FIELD 2i.21>HDTH IJDEFAULT disable 
values £ 

enable 1 
disable 0 
3 

serial no, 4 

tadebess^gerwrater FELD li-20i4tDTH i043eFAULT rx>p 
OPCODE_B!T«NOTATIONS £ 
cc < 

c0-c3 

) 

rrr < 

RO-R7 

3 

rrrr ( 

r0-rl5 

) 

bb C 

b0-b3 

) 

cc ( 

o0-c3 

) 

ii < 

i0-i3 

3 

3 

VALUES C 

wine lOilccrrrr 

llccbbOmr 
yrlr OOOiOirrrr 

yrU> OOlibbrrrr 

Wrtc OOlOccrrrr 

m OOOOlllOii 

WOT Olllbbrnrr 

yxor OiOibbrrrr 

nop 0000000000 


3 


;serial no, 5 

#ad_gen_pin. FIELD iO-iOWlDTH IJDeFAULT nodsal 

VALUES t 

dse-l 1 
nodssl 0 
3 

;serial no. 6 

#data_n>#ni_control FIELD 8-9, WIDTH 2 DEFAULT noTN 
V/U.UES t 

rd 11 

wr 10 

ntx^i CK3 

3 

'rio . *7 

#mac: FIELD 2-7 WIDTH 6, DEFAULT nop 
VALUES 

£ 

nop OOOOOO 

xabus 001000 

010000 
001 101 
iwjl 100000 

muladd 100010 

mulnadd lOOOil 

busals OlliOi 

3 

^serial no. 8 

#mac_rnd_pins : FIELD 0-1 WIDTH 2 DEFAULT nornd 
VALUES C 

rndl4 01 

rndlS 10 

nornd 00 

3 


Th« microinstruction format has now bean oompletaly defined. If no errors 
are detected during processing of the definition file, this microinstruction format 
is written into a compact file named "test .dat" .(The name is given in the DEFN_FILE 
clause). 


A.1.1 PROCEBSINB OF TIC: USER Fl^: 


To process the user file, user should give following command. 
DEF Ftt.ENAHE <or> 



cr 


1 1 


DCF <cr) 

F5LENAME <ar> 

The E€F program procesees the user ctefinititx> file and ^nerates compact 
definiticxi file, »4iich i% then read by the asst«nbler (phase 2) to assemble the 
source code. 


AZ 2 = ^4ssEMBLy aaiSE 

The setcond phase is a corwenlional two-pass assembly. First the compact 

definition file is read and then it stw'is oroc»ssing the programmer's source code. 

The output files of the assembler are, 

-an ASCIi file containing the resulting ob^t code.(.obj) 

-a list file, (with w^or listing if present.) (1st) 

•a symbol file (.sym) 

The errcrs are also displayed crtto the standard output device. 

A2.1 Runnuig the asweebAer:- 

To invoke 'the ass«Bbl«', the command form is ; 

HEA^^ E-switchX-switchl <or> 

The switches are optional and are 

-1 To create list file. 

-b To create binary cocted object file. 

(Hex coded object file is by default) 

Aftw' pressirw (cr> the assembler ai*s for src file nan« and definiticwi file name. 



fk22 Corivw^iofW-*” 

This section covers the lan^tge corwktions (symbols and constants), used 
in the sotrc® code file The assembler does rot distin^ii^i between upper case 
and lower case letters ^ 

A22.1 Sukhoi Ckjnwwntiorw 

All s<#fsbol nafnes in the sots'ce code file must be (^ic^ie. S^nbol names 
should be of length less than 32 characters. A valid s^anbol starts with a letter 
followed by any mix of lett*"B, numbers or underscores. User defined si^nbols 
cannot be the same as Assembler key words. The keywords are 

include, ecM, pmw. pmfw, dmw, dmfw, proc, endp, p_org, d^org, define and 

dup. 

A222 CormtmitM 

The assembler acc3«jts binary, octal, hexactecimal and decimal numerical 
constants. They are postfixed with B or b, 0 or o, H or h, D or d respectively, with 
the default being decimal A hex number must start with a manerio. 
e g 1234h , OB123h, iTTOo, lililb, 1234 

Symbolic constants are assigried by the CQU' directive, 
e.g swnboll EQU 234h 

Symbolic constants can be used anysshere to rwlace nuraeric constants. 


A23 source code FUE> 


In a source code file, projr*«wner can have several modules and a main 


W'W'am. The modules are declared with, 



name of module PROC 

and ended with ENDP The body of the source code and file consists of two 
secticwis the code »«:tiQn and c^laration section. The sectior^ can be placed in 
any order or anywhere in the file 

Commwits can be inserted anyMwe in a source code file, starting with ‘f 
The every character after on that line is treated as a part of the comment. 

A23.1 MXISJE directive: 

f'A.artber of assembly files can be inclusded in the original source file by this 
declaration, e.g iincluKie iest2,ssffl 

It includes the source file test2 asm and this file is assembled first before 
assffliblying the next line of the present source file. Nesting of the include files 
is limited to 10 levels But this may cause problem (on PC0CB O.S.)) of number of 
file handles at a time, in case if in config.sys file, the line file=15 (or 
S'eater nuriber) is not added The number of include files that can be included by 
a sotrcB file is unlimited 

Ai232 Declaration Section;- 

The declaraticx^ section uses assembly dirwslives to declare assembly 
constants, proy'am fiwmory data (variables) , data iwwnory data (variables) , origin 
of data and program newnory that follows origin command. 

For variable declaration, in case of data memory 'dnw' (for integer data) and 
'dmfw' (for single precision floating point data) are the assembly directives to be 
used. Variable name can be given before 'dmw' or 'dtofw' directives, e.g. 
differ dmw 25,25.4'rah3344iil-0b2233h,55443 



jiJO 


dn>H 23*M77^ 

The data for each location should be seoarated by V and no space is 
aliwed till declaration of last location on that line. 

In case of orogram roefflory variable, the dirfi«tive used is 'pmw' or 'pmfw'. 
e.g. buffer2 p*nw 2S5h 

Each data itero is placed in a full word. 

To dcctlicate same data in the memory, 'cM>Ccor®tant)’ directive can be used. 
The constant represents a n«fifciK' which can be placed in one memory location. A 
floating point constant cannoi be used here. Multiple locations cannot be 
chjplicated, 

eg 100dijD<200hl 

200h is duplicated in iOO locations 

e g. i00dupC2C30h,i0Qh) is Invalid. 

For 10dt43<?), 10 locations will be filled with zeroats. 

The starling location of program or data iwmory area can be declared by 
P.CHG and D^ORG directives respectively, e.g. 

P^ORG iOOh 

This directive can be used anywhere in the source file. Thus different code 
seswents can have different starting locations. 

A2.3J3 Code Section;- 

A mic»^^lruotioni c»cie stateflymt has the following formal, 
label; irwtrl & insir2 & inslr3 & ... & 
instrn-l it instm 


The instructions! for each field are swiarated by The end of the 
microinstruction is indicated by the end of line , if the last character on that line 


is not Tlxis a microinstruction bm s«v«ral lim* long. The instructions in 
the microinstrtK:tion can b« written in any oro^. If a particular field is omitted 
then default value i* written by the asswitjler at the corresponding position in 
the microword I# for a field, instead of irmlrtction, defined in the definition file, 
a nuneric expression or a symbol is to be written, then the field's serial number 
(as in the defimlion file) should be written before the expression or synijol. In 
this case value corresixsnding to the expression is put in the field position. 

Arithmetic expression containing stanbols and rximeric constants can be used 
throughout the assembly 

e.g. buffer+i, bufferi2+3, buffer-fbuffer2+5 

eg buffer +8 will indicate the address of 9th memory location in the 

buffer. 

The label field is optional, and wlw» it is presNH^t, a value corresponding to 
the address of present program location is assigned to the label symbol. 

An iraiiructian Miy consists of operands, vtien there are opcocte bit 
notations present in the opcode (given in definition file) of that instruction. The 
value ctTresponding to opwand symbols is put for the opcocte bit notations. The 
orcter in which the operancfei should be written (swoarated by brackets) depends on 
the order, in which corresponding bit nc^liorw are written in the opcode field, 
e.g. 

yire: iOiicarrrr ,opcode 

is an irwiructicri for adc^ess generator, whirei yino is mnemonic and iOilccrrrr is 
opcode, cc selects one of the c0«c3 registers, rrrr selects one of the r0-ri5 

Thus, for this instruction, there are two operands . Thus the instruction 


should be written as. 


yif^cOXrO) 

if the operand* are cO and rO 

The instruction winc<rOXcO) is invalid 

The nurdtjer of such operand* for an instruction is limited to 10. An 
instruction should be Hritten nith no space* in between. 

For trench inslniction*, whic^ need jump adcfr'ess to be W'itten in the 'data' 
field , instead of wrilinf the acklre** excression as a separate expression for 
'data' field (with 'data' field’s serial (Mjmbmr, as giwi in the definition file)', jump 
address can be written in the form of an expression containing constants and 
symbols after the branch instruction mnemonic, before the next instruction of 
other field starts if the address expression is writUn in the branch instruction, 
for relative branc^iing, relative brancti address is md in the 'data' field, 
e.g. jda(si 9 n) label! & .... 

Here jdaCsign) is a branch instruction and label 1 indicates the jump location. 
Thus address corresponding to label! is put in the 'data' field. This can also be 
written as, 

jdaCsign) & 2 label! & 

There are two types of brJMCh instruotions, !> i«*Tioh need absolute branch 
address , 2> whicfi need relative branch adkiress. This should be mentioned in the 
ctefinition fXmse. 

One* txramrfi aiAiress is written m a branch instruction in an 
roioroinstrooUon, then 'data' field of that mioroinstrucUon should not be written 
again, oWienwise a»*«*bl«r give* error B«ssage 


In some machines, there is no s«>arate data field and the branch address 
f^ata is given in the opcode of the tarandi instruction itself. In this case the 


address is written *« an oo«"*nd of that instruction. In these operands 
labels(symbols) can be used e 9 

jump loin####### jOpcxxie and nriemoruc 

Here jump address is »ivsn in the operand of the instruction. The bit 
notation for this add-ess space is means it can take values from #0 to 

#255 decimal Om can writ# it as #0 to #Offh or #0 to #constl . where consti=0ffh 

Thus the inslrtclion maw look like juno<iliybel+2) 


A2.4 EXAMPLE - 


An example program is given The assawfibl(W' output file (.obj) is also shown. 
This program does matrix multiplication on the defined architecture in the 
definition phase as an example 


, c_matrix*a_matrixib_matrix 
d„org iOOh 


a^matrix 

dmw 

12, 15425, 78,343&»283.49 

#4iN matrix 

b.matrix 

dmM 

i»i,25h25J725, 48,69458^ 

matrix 

c.matrix 

dmw 

IBdupCO) 

rfilK matrix 


, For 

c^matrix each element is of 

two words. 

N 

eou 

3 


K 

ecai 

3 


M 

ecRj 

3 



file i - defineesm 


;This crosram is a sample program to ctemoostrate us# of 
definition/assentily phase of the meta-assembler. 

#includte ctef ine .asm 


wrcnlr(CO) & 2 K-2 & enable 
2 HI N & enable & dtitiO) 
itrdOXROl 
yrtc(c0Xr0> 

2 2*M1K & enable & dti(iO) 

itr(iOKrO) 

grtc(o2Xr01 

2 a_malrix & enable & dliCiOl 


iimtializB rO -> counterO 
initialize cO 


initialize o2 
j initialize iO 



pOOOO 

pOOOi 

P0002 

P0003 

P0004 

P0QO5 

p0006 

P0007 

P0008 

P0003 

POOOA 

POOOB 

pOOOC 


pOOOD 

pOOOE 

poocf 







pOOiS 

cxxxxxjiEOBee 

p00i6 

ocax»oooooo 

p0017 

1F4000ACXXDO0 

pOOiS 

iC7FFF200000 

pooie 

OC4OO017i274 

pOOlA 

000000171274 

pOOlB 

000000171264 

pOOlC 

1FC007AOOOOO 

pOOiD 

1CC0043A0800 

pOOiE 

CKlOOOOiOOOOO 

pOOiF 

OOOCXXDIOOOOO 

p0020 

1FC008AOOOOO 

p002i 

1CC003E00000 

p0022 

$$ 

OD4000000000 


OJtput #il«: matrix jobj 


The a»*«iiibler aJ»o prcNJuc»s a symbol file, which can be used for 
disassemblw of the objircl file The first string on a line is the symbol followed by 
it's value Every file has some character secMence at Wxi start and end of the file 
for proper identification, while accessing these output files by the other 
programs 


lid 

A_MATRIX OlOOd 

B. MATRIX 0i09d 

C. MATRIX 0il2d 
LABELl OOOFp 
LABEL2 OOlOp 
LABELS 0014O 
TEMPI 00i9p 
TENP2 OOiEp 
TEMP3 0022p 
llo 


synrijol file: matrix ^ym 


Note: t' stands for <esc>. 



A25 Error 


The aumirblm' siiv«* following •rror mmmaimn. 

1. Syntax err<x- 

-for a Myr\i»x w-rcr 

2. ERROR: hfejTnto«r is invalid 

-If a nunaric constant has invalid characters in it, assembler gives this 

error. 

e g 23g0h is invalid, since for a hex constants valid characters are 0- 

3,a“f . 

3. ERRCR: Symbol redefined 

4. ERROR: Data field is already ctefined. 

-For a microinstruction, in which if the sequtwicar has a branch 
instruction which need jumo address to be in the 'data' field and if a symbol 
expression is written for the jump address in the secfuenoer' instruction. And again 
if 'data' field is also assigned some value in the same microinstruction , the 
assembler gives this error. 

5. ERROR: SvTibol not found 

6. Error in line . 

-For errors other than alKJve this W'ror message is given by the 

assembler , 

7. ERROR: mnefwamc for a field written twice in line. 

-When two or more instructions are writbm for single field in a 
nuCTXJinstruction, this error is given by the ass«fibler. 



AfTOCDC B 


SIMULATOR MANUAL 


ii IfflSQIS&Iifitti i 


This simulator- provides an easy way to verify systolic array system designs 
without commiitirygi to hardware dev«lop4wmt. As shown in figure B.l , the simulator 
reads the archiieclur* desa-tption file and the object code files and symbol table 
files outputted taw the» meta -assemble An arohitectire Description file is input to 
simulator to start a aimulalcr session, The object cxxie files are loaded 
interactively The Symbol Table files are loacted implicitly whwi the object files 
are loaded The cferfmation files are loaded implicitly when the simulator is run. 

readin® thee Detbug synubol Table, the simulator interacts with the tser 
symbolically The u%mr can make references to variables and program labels using 
symbols defined in the usw-' program, avoiding the n#«J to decode the symbols. The 
Simulator can di»a*s»#*itole a microinsiruction, making full use of sgnbols defined in 
the user program 

The simulator i* interactive. By reading the Architecture Description file it 
configures itself to match ttwi target system hardware. 

Using upload/dowrtload files, the user cwi load data memory (to simulate data 
generated or ioadted lay external devices, Host > to the simulator and later upload 
simulator proceswed data to a file for siiisequwit analysis. 


A valid architecture file starts with character '®' in the first line. In the 



architecture file c*n mmrify ert^iUcltre of the syetem. i.#. he can specify 

Isnsth of X (3UW-* < u acJdr oueue. and size of data inemory, orograaa memory, 

scratch pad mewrf u itir » cell and x, yd , ya memory sizes. He also can specify 
pyiflbw of wametw^ is not specified some 

minimum default va2i.i« i* as»iJt*ed The values assigned to these parameters should 
fKjt be less than |h# siefauli values, otherwise default valu« is assumed. 


Architiechjr* 

Description 

rile 


PM/DH 

memiory ifwapi 
files » otojl 


D^sjf SymlX7l 
Tadle files 
« syirt 


Definition 
Files for 

ru & ca.1 



* 

Uploed/ 

ItoNnioad 

File* 




Upload/ 

Download 

Files 


SB-ULATOR 


command i/p 
Mnd 

Information 

Display 




Fig, B i Simulator input output Him. 





The mnimmi 


parawetiw 

size 

X memw'v 

10?4 

t a Tmirw/T y 

i024 

data rnernory 

1024 

proflr am mmtor y 

200 

X owwi* 

128 

V oue^ai 

128 

Addr^ ciuei,ei 

128 

scratch pad 

512 

re»isler file 

512 

no of cells 

1 


Anexampl* O# »rchjit»r.iijf • fil# )■ «» folIo^Ml 




ya 

2028 

X 

0?f#h 

yb 

mffh 

%Q 

Offh 

yoi 

512 

iddro 

256 

dmem 

2028 

scrat^ioadl 

fcOO 

orcva»»*e» 

41» 

hO-OT-Wlils 

3 


S 

These parameter vakj»s can be »jiv«n in an^ ortier. 


SteiQl Sfi* 

To start the siinulaior type 
SSIM <cr) 

Arch^fil* syst arh 



^l«cturt fii* i*n* Urn •jfBultlor how to corrfifur® diU memory, program 
inefflory, auetw sjx#», o# c*nf m Uw array, tic Thtra can be nunber of cell 

units apart frcwr. tt*# intt^'iact unjt dFUJ The ccarntnands gi\«n at the simulator 
proflSJt are meant 4 or Ih# coBmamd cell Wh«n the Simulator is started the 

coromarici cell j* ttei If U or ceil 0 The cotwRand cell can be crfwigKKi by giving 
command "elew^pUccn’' at t>w ijm, pronpl 

B3.i Loading data mmmiru, or'Oigr'a* ewiiiory, queuKi, etc. 

Alter the simiiialor conli«i,jr#» the basic memory structis^, it prompts &<ith 
•SiH) "1 

for a command 

SIH) Iccide (or) 

_file scorn/Otitej 

The command reads the object file ( ob)> and the Symbol table file (.sym) created by 
the meta-assembier and loads it in the Dreswii txwsmand cell's program memory.. 

To load the data memory of the command cell by integer nuiribers, type 

SWi Idmicrj 
^Address Ckcr} 

.LengU’i Stcr) 

iOOOO Oabcdfh OlOiOlOlb 123A 33333<or> 

ThtM hex, octal, binary or decimal input can be given. 

To load the data memory with floating-point data, type 

SIH> Idm -#<csr> 

^Address 0<cr> 

..Length 5<cr> 

1.23 1 2ei2 1234967 i 00e24 24.23456 


Oammnd 

Vtwn %\»rUfm » wi*»i.on u*«r m»y wish to extcute th® same set of 

ofl»tuindl* to bf tt"** sifwilalor to « known state. Instead of enlerir® these 
conma*^ slriw* 9 v*f y iinw*. user can create a conmand file which contains these 
conimand string* Ttw command *trin«* aoiaear m th* command file exactly as if 
typsKl interactively 

To execute commands in a command filei tyoe 

SIH> cmdfile<cr’i 
.File cxn^cmMcr} 

An exainple of ttie command file is ai«^ below 

eleinent 

0 

laxle 

sconvO otoj 
lx -f 
0 
2 

1.3 

3.4 

•Iwnant 

i 

Icxxle 

sconvi ,ob3 
Idtn -f -If 
0 

4 

corwdat 

Not. th.1. ^1. wvtn, d.1. inout. for lo«i -of. d.U rtwild b. 9 .v»n on 

the secMrate line. 


0J3 Hirclw**"# 


wtow^ts 


The ♦*»» county* for «»ch c«ll ^ 


i^o>i 


can 


. . nna ot H « « inierrupt 

t* »«l to ir.ltrrtjf.lir^ *cUv*t« 

1 I ifrte 

sourct# for #•r^ r«n »r%l Iff t>y I^T) J^t**^* 

'to 

inUrrupts *♦-.'•! ijm* •iojr««, an int«rrupt i» 

Mil's #tcau#f'-'{ »<• SjTOjUtfir •i*rigio _ jjjj int^rupt »ourc«* 

. I i»u®l (i. to 8? 

enabljos? ao r.*'.s »o^,jfy interrupt 

iriwv*! Um o*r »od 

Sm ••l»rl <or) 


tre disatol**^' 


ar«i tt« 


„rnjw*>#f I (cr ,. 

_p#f »wl U’Vf'l**' <cr-) 

w"iod to 0 

TodU»il3lt »n »f»l»fn4H »ourr«, *in^ly »«t th« intarrupt 


1€SET: 


R«*»i commr^i t •*#! » lh« «y«t«fn to •t»rtir»a pO*itiPf* 
prografn cons^im % af «n Uw r*ll» mna ITU ir« r«Mit to 0- 


All the 


oountBTS' 


and 


B3.4 OPERATION WIM. !» 


i-fady 


to run 




Aftar conf jgHjr'ifHp tru# Hur th« fimul»tor »* ' " cjctend 

6.uUt»-™=*' 


Drogrtm Th* simsilaicif tJii#? »l#« m oo« of lh» lhm« 
"'octe, and Sir>«jlt St«p nmM 




, fhiS 

wod®- 1*^ 

The ctefault and ttarling c»«riUon mod® is lh« 

a br««k oo^ 

tht sAmuiaior run# »t lull *ia®N®d incl halts only ^ tt»® 

*naxfnl4r»d Br®ak ccr»diOof>s tncluda brsak Doi«ii»# 



sun t'’» ’ 4 ii 

■';m f.p. ! 

Uhtn the 'n-« a n »'ai. simulator displays a 

fflifsaspe to u< '‘i U'Af i 

B35 DBPLAV HCHi: 

The sirsmjiaic/' c»f- dini’-Uy 

tii* 

-DuU nmmwy 
•FVoflr-a* I»»ww.wy 
Adrlr. X f Qumm$ 

• Scralc#^ pad mmmiry 
•Al U , Hullu»h«f 

foT' tha conimand call t>w »ivir *9 c€mmnfM r§gf, dm, wn » atfeJrq, xq, yq, spad, areg, 
nreg raspactisAily Tor mxtmplm , to diwslay raaigtar file contents in floating point 
fcrmal, type 

51H) r^g/f Htry 
^Acklr'mm QCkcr^ 

iOlocalioni 4rim o Hill t# displayed m the floalinf ixiint format. 

To display 

■ K n^r, -j 

•V* ffwwcjry 
•Vbl memory 
-Yb2 m(wm.ir“y 

command cell nwKi not be l^si cell 0 OFU) The commands to display these 
"Emories are x, ya. ybl. ybZ re«P«:tiv»lw, 



The *t , at il mt Joacj^j birfort, by the 

COtWM*^ 

B 3 £ MDDf'Y {XJHMAM)*-. 

To wwHy r-'fr^ •» rmjrHar #fjr Ih* co*w»siaf»d C«n lyD€, 

SM) Wlipc ('C#* j 

l2fMcr) 

To ch»ngw INt c<,i*w*»rKj r,«J3. lyr» 

SBi) 

^Ciirr*ril <.*U 0 

^Nijwtxw i^rri 

To chiMTi9#/4iiii’5l«w ryrJ* CNriMfrt, lypif 
SIM) cycl^cr) 

_Ciirr#nl cycl^i cour’>t 0 
_oouni I2<cr> 


B3,7 Ending • BkmAmUir Brnmian 

At th* #nd of « «imuJalion u»Kr' c<Mi itv* • SfjpntrA of d«tt iwMBory 

or y* or yfo iniKnor*y ccmtawrt bv Ouimirt^ »t irAo • flit 
SIM) duKflPdwi {o 0 lic 3 r%}(cr) 

>.Adar«»« Ckor> 

.L»'>glh iCkcr) 

^fiJ« slump sJatca") 

Th# optiof) is *-#' .to dump • flosting potrA dsiU Th« dofm^t i» integer daU. 

To t«rmin«i« « *i«ui«iior» s«isiOfs, type 



Th® Vb nwiwy can b# r@ad ssquenlially , as it was loaded before, by the 
ccxtimand rc^ 

B 3 A hckwy cmmhands 

To modify oro^am counter for the coromard cell ttpe, 

SWi> selCKKcr) 

..address : l^cr> 

To change the co«nmand cell, ivpm 
SIM> eleinenKcr) 

_Cijrrenl cf.»mmar«i cell ■ 0 
Kcr) 

To change/disolay cycle count, type 
SIH) cvcle<cr> 

^Current cycle ccxnt' 0 
.count i2<cr) 

B.3.7 Ending e Simdetor Seeeion 

At the end of a simulation session, user can save a memory 

or ya or yb memory content dumping it into a file : 

SJM) dbmcKtn CaotionJ<or> 

.Address 0<cr> 

.Length iCKcr) 

.file dump dat<crl 

The option is '-f' ,to dufif> a floating point data. The default is int-espi" 


To terminate a simulation session, type 



SIH> *xit<cr5 


„V*rify Y'<cr> 

snd you to ihi* hiost sy*t<«T>^ 

B 3 B help 

To recall what cofnrnajnd* are siaaported by the simulator, user can list all 

supported commands with the help ccxweand. 

SIH) hetocT) 

To get a help on a particular" command type 
SJM) help commar«lrwime<cr> 

B.4 Simuljtior 

This session lists all simulator commands. 

DISPLAY CONTROL COMMANDS 

to -Forces syevious display to scroll Torward. 

ba -f-oT'ces previous displ»v scroll bade, 

ya Displays ra memory 

dm •Dimolay* dal* mwnory 

X -Piiplays » memory 

xci -Displays * osieue 

yq -Displays y Queue 

addra -Displays address oiueue 

regf -Displays registafr fils 

pm -Displays F*ro9ram faemory 

secreg -Displays sequencer registers. 



ybl -Displays Ybl memory. 

-Displays Yb2 memory, 
a^eg -Displays address register. 
iiM'eg -Displays Multiplier registers, 
areg -Displays ALU registers. 

rcM^ -Reads Yb memory from the present counter position, 
dispint -Displays the internait sources, 
dispb -Displays break points. 


EXIT COMMAND 

exit -Exits from the Simulator. 


FILE CONTROL COmANDS AND LOAD COMMANDS 

lx -Loads X menKjry.®ata can be loaded from a file by giving -fl switch) 

lya -Loads Ya mai»ory.<Data can be loaded from a file by giving -fl switch) 

llj 3 -Loads Yb memory.CData can be loaded from a file by giving -fl switch) 

1dm -Loads data memory .(Data can be loaded from a file by giving -fl switch) 

Icode -Loads program/data memory from obj file. 

dumpcfan -Forces DM memory sepnent dump to a file. 

dumpya -Forces Ya memory segment dump to a file. 

dianpybl -Forces Ybl memory segment dump to a file. 

dtHnpi*>2 -Forces Yb2 memory sesRnent dump to a file. 

cmdfile -Executes simulator commands found in a command file. 

BREAK CONTROL COMMANDS 

setb -Sets a PM break address, 
clrb -Clears one/all break points. 



OPERATION CONTROL COMMANDS 

extend -Invokes extend mode, 
entul&te -Invokes emulete mode. 
ss -invokes single step mode. 

WDIFY COMMANDS 

clryb -Clears Yb memory read/wri^® 

. gii number 

element -Displays/changes command c 

setpc -Sets the PC. 

cycle -Displays/changes the cyd® 

CC»yFIGLf?ATim CONTROL COflANDS 

reset -Simulates hardware system re®®^ 
setint -Activates the interrtffjt souT®® 

EXECUTION CONTROL COMMANDS 

rtn -Starts user program rLr»nin9- 
. <cr> 

HELP COMMAND 

help -Displays comnand list or helP 


a conw^nd. 



/>iPPEM)DC C 


; TW: DEFMTION FUE FXK T>C WTERFACE UNIT (FU). 

TITLE INTERFACEUtsaT 
DEFN_FILE 'SIFUF.DAT'' 

WORD WIDTH 56 BITS 
DATA_LOC_WIDTH 32 
PROGRAM_LOC_UaDTH 56 
;i 

#s»querx5®r: FIELD 43-55>IDTH 7 M;FAULT coni 

OPCODE_BIT_NOTATIONS [ 
kk ( ,€ondiiim 

00 uncoodilioml 

01 nol-fl«0 

10 f 1*9 

11 sign 

> 

oc ( 3«l«cts th« relevant registar(R3-R0)and 
C0-C3 /or counteHCS-CO). 

) 

ii (Jlecides iximber to be added incase of 
10-13 ; AIRSP instruction. 

) 

3 

1NSTR_N0T_AVAILABLE £ 
jrc<signXd3) 
jrcKsignXcl) 
jro(signXc2) 
jrcKsigr^Co^ 
branoNsignXoO) 
branoNsignXol) 
br*nch(signXc2) 
br«noh(si9nXc3) 

3 

BRANCH_INSTR_WHICH_NEED_DATA C 
ABSOLUTE ( 
jsa 
jda 
jdrst 
) 

RELATIVE { 

Jsr 

jdr 

) 

3 

VALUES £ 

jjump & branch instruction 




noioioi 


irmbe 

0010011 

irmbs 

0010010 

disir 

0010110 

enair 

0110110 

slir 

0010111 

stir 

0110111 

slrivp 

0011101 


relative eckiress width controls; 
reli6 OlOOiOO 

rell2 OlOOiil 

rel8 OiOOllO 

Stviscellaneous instructions; 
cent 0000000 

idle 0010000 

ihc 0100101 

wes 0100000 

3 

■PL 

#data; FIELD 32-474«flDTH 16, DEFAULT zero 

VALUES C 

zero 0000000000000000 

3 

;3 

#aGki‘^s_gener»tor; FIELD 22-31>BDTH 10,DEFAULT nop 

OPCODE_BIT_NOTATIONS I 

oc ( j-Comperison Resister number 
c0-o3 
) 

rrr ( ;Three bit Address register nunijer 
R0-R7 
> 

rrrr ( ^our bit Address register number 
rO-rlS 
) 

bb ( 3ase (offset) register number 
b0-b3 
) 

ii ( ;Initialisaiion register number 
i0-i3 
) 

pp ( ;Two bit precision code 
p0-p3 
) 

X ( ^)ne bit oor^rol bit 
xO-xl 
) 

3 

VALUES £ 

^looping instructions 

Wine lOllcorrrr 



s/dmc 

i0i^E3ccrrrr 

yadd 

iiccbblrrr 

usub 

iicebbOrrr 

r«3i»t«r tran«f«r iristrucliom- 

yrtr 

OOOlOirrrr 

yrtb 

OOlibbrrrr 

yrtc 

OOiOccrrrr 

dii 

OOOOiillu 

itr 

iOOOiirrrr 

blr 

OlOObbrrrr 

rtd 

OOOlOOrrrr 

ctd 

OOOOiiOOcc 

bid 

00001 iOifato 

iid 

00001 liOii 

;logical and *hift instruction* 

yor 

Oliibbrrrr 

yand 

OllObbrrrr 

yxor 

OlOlbtsrrrr 

USST 

OOOllirrrr 


OOOiiOrrrr 

rst 

0000000001 

dto’ 

OOOOlOiilO 

ortd 

OOOOlOlill 

ssti 

(XJOOlOOiix 

s«tp 

OOOOlOiQpo 

sety 

OOOOOlOOix 

sslr 

000001 iOlx 

salb 

OOOOOilOOx 

s«tu 

oooooioilx 

seta 

OOOOOlOiOx 

NTS 

OOOOiOliOO 

rda 

0000101 101 

Ida 

OOOOOiiiiO 

^iy 

OOOOOliill 

yrav 

iODlblirrrr 

no© 

3 

0000000000 

;4 

#a 9 »irsxiaU„*«l«ct FIELD 2i-2i>HDTH i^JEFAULT nodsel 

VALUES I 


nodMil 

0 

dsal 

3 

^5 


*a9_»ir_SHil»ct ; 
VALUES E 

FELD 20-20^dth noair 

air 

1 

noair 

3 

iS 

0 

iwrita.yb^iwwi; 
VALUES t 

FELD 19-19 width 1, default nowr* 

noMryb 

0 



wryb 1 

} 

;7 

#writa_x_qu®ue FIELD iS-18 Hicfth 1 dafaull nowrxq 
VALUES [ 

nONTXQ 0 

wr'XQ 1 

3 

traad.wb-ww** FKLD 17-17 Mic^h 1 dafaull nordi*) 
VALUES C 

norcMs 0 

rdwb 1 

J 

3 

#r««d_y««.ffMK«i FKLD IS- 16 width 1 dafault nordya 
VALUES £ 

nordya 0 

rdya 1 

3 

#r«*d_x_m<«n- FELD 15-iS widlh 1 dafault nordx 
VALUES t 

nordx 0 

rdx 1 

3 

;11 

icl«*r_x.ni«ro: FKLD 14-14 widlh 1 dafauit noolrx 
VALUES r 

noolrx 1 

clrx 0 

3 

;12 

icl®ar_ya«.iwiif> FELD 13-13 width 1 okifault noclrya 
VALUES I 

noclrya 0 

olma 1 

3 

;13 

icl«ar„yb«(nero; FELD 12-12 wioWi I dafauli noclryb 
VALUES £ 

noclnJb 0 

clryb 1 

3 

,14 

tdata_anabla. FIELD 11-11 width 1 dafault nokan 
VALUES E 

nokan 0 

lean I 

3 

15 

tflaa^aux; FELD 8-10 width 3 dafault noflay 
VALUES £ 



noflag 

cm 

•ync 

001 

m(3J 

010 

end_ao 

Oil 

«» 

1C30 

at>cS 

101 

abcS 

110 

abc? 

111 


3 

;i6 

#daU Utch.«n.bl* FIELD 7-7 mm, i d«f*ull nodtn 

VALICS l 

d«i i 

nocten 0 

3 

47 

#i«ld 6-6 Hidlh i default rwclatb 

VALUES I 

dsib 1 

nodstJb 0 

3 


#writ<i„y„duiwj> FKLD 4-4 mAH i cialault r>owryq 
VALUES I 

rx3««^ 0 

wryq i 

3 

49 

V^^S*^***'*^^^**^ f’K!LD 3-3 widlh 1 default ncac'addrsq 

nowraddiraQ 0 
wraddraa 1 

3 

ao 

i«_bu*„dir*clH3n FIELD 2-2 MicKJh 1 dafault c«lU«r> 

VALUES f 
«*334-.ri 1 

calln^ 1 0 

3 

21 

4-1 Width 1 chifault nowrx 

VALUES C 

hour* 0 

MT* 1 

3 

;22 

0"0 widUi 1 chrfaull norddn 

V<U.UES E 

hordcim Q 

rddtt 1 

J 


; the DEFMTSJN F1.E FOR THE CO_L UNTT. 


TITLE CELL 

DEFN.FILE ’-SCELir f/AT" 

WORD WIDTH BITS 

data_lcx:„.width 32 

PR0GRAM„LCX:„ WIDTH 136 

;1 

#MHaM«TC«r FIELD 129*i35i4IDTH 7 DEFAULT coot 

OPa3D€„BiT^NOTAT!ONS £ 
kk < ,Condilir» 

W) unccrxJitionni 
01 r>olFl *9 

10 flag 

11 fifln 

) 

cc I ,S«l®ct» r«l»v«nt r»ti»l«r<R3-R0)and 
C0-C3 ./or count •r<C3»C0T 

) 

11 c ,D*ciehr* numbar to b* *dd»d incai# of 
IO-13 , AIRSP instruction 
5 
3 

»«TR.NOT„AVAILABLE C 
)rc£«iinKcO) 
jrcttignkcll 
Jrc(»iinMc2l 
jrc<»i»nXc3) 
brancHsianKcO) 
brancNtignXci) 
brancW*ignXc2) 
brancMfiinXcS) 

3 

W?ANCH.JWSTR..WtflCH„FCED„DATA £ 

ABSOl,.UTE { 

}sa 

jd« 

jc*r*i 

5 

ILLATIVE ( 
jsr 

) 

3 

_ VALUES t 

ijump & branch instruction 

ipcof OOlOiOi 

ipcnf OHOiOi 

3t*<io lOikkOi 

jdii illkkll 

Wr illNkOl 

rc illidciX 



jsii 

iOikklO 

jdr»t 

iOOllcc 

jr* 

llOllcc 


lUkktX) 


llikklO 

rtn 

lOlkkil 

brimoh 

ICJOIdka: 

;»t«ck opwaticsn 

jsubroutim *t*ck 

{)sctes 

OOliliO 

possd 

OlHiiO 

MTSSP 

0001110 

rd»sp 

0101100 

dssp 

oooot^io 


r 



disir 

0010110 

enair 

01 101 10 

»lir 

0010111 

*tir 

oiiom 

slrivp 

0011101 

^iative addre** width control* 

reiiS 

0100100 

r#ll2 

OiODlil 

r#l8 

01001 10 

. 

A 1 



VALUES t 

zmro CMOOOOOOOOCJOOOOO 
I 

^3 

FELD 112-112 width 1 default nokwi 

VALUES C 

noken 0 

ken i 

3 

;4 

ideta^letch^eneble FELD ill-111 width 1 default nod«> 

VALUES t 

den 1 

fsoden 0 

3 

3 

#data.*trobe field 110- 110 wicRh 1 default nootetJb 

VALUES t 

detb i 

nodi'th 0 

3 

i6 

#addre»*^»enerator FELD 100-109MDTH 10J3eFAtA.T nop 
OPCODE.BIT.NOTATIQI^ t 



bb < .Bast 

bO-b3 ***’^ rtijjsttr 

!j { 

j0-i 3 r*9i*t«r nuBib«~ 


00 i J'wo bit 

0O-p3 ' codt 


X C Xlnt bit - 
xO-*l 


o^ol bit 


VM.UES I 

Jooping in»lruciiQy.j^ 

Mine ^Oii* 

IWM 


iPUb 1 icx;, 


i I 





udiw 

ynrv 




fym 

J 




;7 

#»g„ir«xl«t«„s«l«ct FIELD 99-99>HDTH i,DEFAULT nod*el 

VMJMS £ 

nodiai 0 

ctawl 1 

J 

;8 

#«g„*ir_s«l«ct: FELD 98-98A<idth noair 

VALUES £ 

air i 

noair 0 

3 

,•9 

ifltg_.fnux- FIELD 95-97 Hidih 3 dafauli fig 


VALUES £ 

fig OOO 

cflBpa CX31 

laasthan OiO 

aqual Oil 

graatarihan iOO 
invaloo 10 i 

ovrflo iiO 

undarf lo ill 

3 

;i0 


iwrila_»cratch_pad_iii«» FELD 94-94 Hictth 1, ctefault nowrapmain 
VALUES £ 

noNracxnwD 0 

Hr$H3RMRn 1 

3 

;il 

iraad«»crat^„l>ad_.wii«» : FELD 93-93 wicfth 1 dafault nordspmem 
VALUES £ 

nordapiBaw 0 

rdspnicfn 1 

3 

iwrita«x_ciuaua FELD 92-92 Nicflh 1 dafault nowrxq 
VALUES £ 

noNTxq 0 

wrxci i 

3 

;13 

*writa_y^quaua; FELD 9i-9i width 1 dafault nowryq 
VALUES £ 

nowryq 0 

wryq 1 

3 

;14 _ 

iw'ite^addr^a^ciuaua: FIELD 90-90 wicfth 1 dafault nowraArq 


r«tr«r>*y i 

3 

^22 

i^*d_*^es5„aueu«p FIELD 8i-84 Hidth i ctefwlt fiordacfdrq 

nord*dclra 0 
rdadct^ i 

3 


;23 

FELD 00-80 width i diH^ault noclraddrqMaster 
, R®ad/Writ® count«r» »r« rmmmima. 

VALLES [ 

ncwlraddra 0 
ol^jwkJrts i 
3 


reset. 


^4 

ftr«tr«WRil»acWF^«*_qM«je FELD 79-79 width 4 default noretransaddrq 
^^etrammt 

; Only Read ccxjnter is reset »d 
VALUES I 

noretrarwacidrd 1 
^retransactelrd 0 
3 

#address.c»*msbar.dmaddlrtMii FELD *^78 width 4 default axbcbn agi 
VALUES ( 

axb<lm_a«u. 0 
a)dtx4m._adclraj 4 
3 

;26 

taddNws_cr«ert>ar„«NKldr FELD 77-77 width 4 default axbsp agi 
VALUES I 

axbsp.. agti 0 

axbsp.adclrdi 4 
3 


•M„«iueue«ifii3wt.,select FELD 76-76 widWi 4 default v#r#vious 
VALUES I 

yourrwA i 

yprevious 0 

3 

■m 

ifflui..port.»eleclion FELD 74-’3^ width 2 default noinput 
VALUES I 

noinput 00 

»»ul„aen 04 

•ul..b#n 40 

3 

j29 

*4lu..outpiit.enabla FELD 73-73 width 1 ctefault noaoen 
VALUES £ 

noMwi 0 

aoen 4 

3 


tmul„C3uipui^«^bi« FIELD 72-72 nifMh i dtfiiult noraoan 
values C 

noi«o«ri 0 

WJ«o i 

3 

;31 

FELD 7i-7i width i ctef«tat m»p 

VALUES I 
mm ^ 

mfitmi 0 

3 

^ 3 Z 

§r«SI_lil«_writ»^» FELD 70-70 width i dsfaull r®3_«wrinh 
^iUt r»aAii«r pcrl A 
valuM { 

r«9«*wrinh 0 
rwi.awr 1 

3 

33 

#n«_#il«.wril«„b. FELD 6949 wicfth i d»f«ult r«g_bwrinh 
iJriUi n*»i»lir port B 
valiMB I 

r«fl_bwr 1 

3 

34 

ir*fl_fil«„wril«_« FELD 68-68 width 1 default r«i_«wrxnh 
4*iriU r«ai*l«r port E 
valuM t 

r«tt_«Mrinh 0 
r«9_«wr t 

3 

35 

•rm.#il«-.r««d-0. FELD 6747 ■width I diifiwll ri«*.otri 
Flaad r* 9 i»t«r port C 
VtlUM C 

f«SL.otri 0 

ntt„crd I 

3 

36 

«r«9„fi3a„r*»d_d FELD 6646 wicfth i dafault r«i„dtri 
iR«*d r*in*t«r port D 
values C 

rm-«Ari 0 

rm-drd i 

3 

37 

#rm.Fil«.r«ad.« FELD 6545 width I default ref_etri 
F^ad ritaiater port E 
VliUMt { 

Fef„«tr4 0 

re«-.erd i 

3 



#r««_fa«_*c*clr«*js„* FIELD 58-64 wjcWt 7 Omt^i zimidd . 
0PCODe„BIT„NOTATIONS I 

f0-«127 


) 

3 

valuMi C 

zw"oacld„* OOOOOOO 

mmmM 

f 

i39 

fc^.fil«_«cldr««»^b FELD 51-57 width 7 dufault zw'oacH b 
OPCODE.B!T„NOTATION 6 I 

##•«««• { 

m-IH 27 

5 

3 

vtluts { 

ziif'oadd„b OOOOOOO 

b.addr 

3 


,40 

F ELD 44-50 width 7 flhrfault zw'oadd o 
OPCOOe.Blt.NOTATIOhe I 

t§mnm ( 

•0-il27 

j 

3 

VtlUMS C 

z«rcMidd_c OOOOOOO 

c. addr •••••Ml 

3 

,41 

irMi.fil#„*ddr««*^d FELD 37-43 width 7 ddfault zaroadcLd 
0Pa5D€.BI7. NOTATIONS f 

••••••• ( 

•0-*127 

i 

3 

V&lUMI t 

xm-mOd^d OOOOOOO 

d. »dclr ••••«•• 

3 

;42 

<Nw-fU«.ftdclp«*«,« FELD 30-36 width 7 d«f«ult zart^dd.* 
ORX«3£.»T>NOTATIONS C 

#•••••• ( 

•cMiir? 

3 

1 

VtluMl C 

i»ro«dd.» OOOOOOO 

•.MUr ••••••• 


Flit, I' 14-14 Nic^ 1 dmtmuli mwry» 

values t 

r»own#« 0 

wry» ^ 

3 

i4f 

ialufiELD S"13 Hicfth 9 d#f»u3l »nop 


V/y-UES t 


anop 

OOOOOOOOO 

iadd 

001000011 

imibb 

001001011 

isuba 

001000111 

aandb 

OOOOlOOiO 

aorb 

OOOiOOOiO 

ajosrb 

OOOliOOiO 

«add 

lllCM»Oil 

sjKJbb 

lllOOOill 

asVba 

11 100101 i 

soonp 

111001111 


3 

*50 

i*lu_*p.bu##iir\.dif *cl»an riEi D 4 4 wjcflih 1 ctofAult iodl.lo.in 

VALICS t 

iOut.Lo.m 1 
iui^to.out, 0 
3 

i 5 l 

i*lu_* 0 «toul'f»r_»n«tol«rKLD 3-3 width i dtf*uH •luout.txifdii**^*^ 

VALUES I 

•luout.tjufdisabi* 0 
tluout.bufwi 1 
3 

•«lu_ii>orl„*«3K(ct»cin F flfl.D 2-2 wic^ 1 d*#'»u3t noAinout 

VALUES t 

rmirn*^ 0 

ilu.ii«n 1 

3 

33 

••lu„boort.*#l<iclior^ FIELD M width 1 d*#«all nobinput 

VALUES £ 

rioblfixA 0 

*iu.t 3 «in 1 

3 

•x^qu«j«^inix»t.»«3»cl riELD CM3 width 1 d«#»uit xprtvioui 
VALUES I 

xcurrwit 1 

xnFxtvlous 0 

3 
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Program Sequencer 


ADSP-14G1 


IS-Bft Wicrocod* Ad<fr*»»ipg C«p«bitHY 


Exttnslv# InttfTupt Frocutsing, With T*n On-Chip 
Interrupt Vector* 

7«>n» Cycle Time; 25rii Ch^ck-to-Addre** Delay 
S4-Word RAM for Storiog: 

Subroutine linkage 
Jump Addreaaei 
■Counter* 

Status Register 

37SmW Maximum Power Dissipation with 
CMOS Technology 



4S-Pin Ceramic or Rlaitic DIP and 

52-U.d W.ftic U«d*d Chip C.m«r W0RD SLICET« MJCROCODED SYSTEM WITH ADSM401 


GENERAL DESCRIPTION 

The ADSP‘14^1 if> a high-speed microprogram controller op- 
uniized for the demanding sequencing tasks found in digital 
signal prtxcssors and general purpose computers. In addition to 
hig’^ speed Q^m clock-tcvaddrcss delay) and large addressing 
range MK of priigram memory , this Word-Siicc^^ component 
fcu^' unique features that make it highly versatile. 

• Of j chip storage and control of ten prioritized and 
maskable interrupts 

• fou*' dcitcmeniing event counters 

• aHolute. relative and indirect addressing capability 

• v.apahihty Iwriicable control store) and 

• « dsnanikally configurable 64- word RAM. 

Thi ADhP*I4<il mKroprogram sequencer’s main task is to 
pf,*v» 4 c the *ppfL>pnate microprogram addressing to support 
^pc..»gfammmg requirements (c.g., kxiping, jumping) branching, 
u.r*roaiincs. condition testing and micrrupis). An internal Look- 
A-nsad ppehne, controlled b> bc^th phases of the clock, allows 
*3: .% DSP* 1401 to satisfy these requirements at very high speed. 

During each micro-insirucuon, the ADSP-1401 monitors the 
condraonj and instructions to determine the next microprogram 
aodress This address can come from one of several sources: the 
stack, the jump address space in the RAM, the data port, the 
interrupt vectors, or the microprogram counter. An extensive 
set of conditional instructions arc also available, including jumps, 
branches, subroutines, interrupts, and wriicable control store. 


The ADSP-140rs internal 64-word RAM is user-configurable 
into three regions; subroutine stack, register stack and indirect 
jump address space. The subroutine stack is used for linking 
inicrrupts and subroutines and, during their execution, allow 
storage of sy’stem sutes. The register stack allows association of 
unique jump addresses with various levels of iniemipts and 
subroutines (both local and global stacks are provided). Indirect 
jump capability is also supponed, addressing for which is provided 
at the data port. 

Interrupts arc handled entirely on chip. The ADSP'140rs internal 
micmipt control logic includes registers for eight external (user) 
inierrupi vectors, a mask register, and a priority decoder. Two 
additional vectors arc reserved for intcmally-gcncraicd interrupts 
resulting from counter underflow and stack limit violation. A 
stack limit violation is caused by stack overflow, underflow or 
collision. A mechanism is provided for recovering from stack viola- 
tions. 

The ADSP-140rs four decrementing 16- bit counters are used to 
track loops and events. These coimiers generate a signal when 
negative. This negative condition is used by several conditional 
instructions and can also trigger an internal interrupt. 


» » irademark of Awalog DtviciM, Inc. 
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f 'e ? ADSP- 140 1 Block Diagram 


addeessing mooes 

Direct , both *b»ol«ic »imS rclitivc 
Indirrci; from intcnitl RAM 

HARDWARE FEATURES 

iBStnjcijan Pon 
Bidircctionti Diti Fon 
Four Input Address Multiplexer 
Three Suck Pointers 
Four Event Counters 
Condition Flag 

Eight Pnoriiixcd and Maskable User Interrupts 
TTR Pin; 

Trap 

Thrce-Siate 

Reset 


INSTRUCTION TYPES 
jumps and Branches 
Stack Operations 
Status Register Opcritions 
Counter Ojxrraiions 
Interrupt Control 
Rclitivc Address Wid"th Controls 
Instruction Hold Control 
Writcable Control Store 
Dedicated Counter Underflow Interrupt 
Dedicated Stack Overflow Interrupt 


ADSP-1401 PIN ASSIGNMENTS 
Pin Name Descriptkm 

Thc7-biimicrtmcrm:^ 

ADSP-1401. 


Yis-Yo 


D15-D0 

EXIR 4-1 

CLK 

FLAG 

TTR 


Vdd 

GND 


Output bus adikii 

program mcmorv' . 

Bidirectional Dm bss fee Bansfemng data to or 
from the ADSP-140i 

Four external mierrupi request lines. Note 
tcmal circuitry supports 8 interrupts with the aid of 
an external 2 to 1 multiplexer . 

External clock input 

An input used for conditional instructions. Its 
source is usually a condition multiplexer. 

A multi-purpose pin accommodating craps, output 
disable and reset. 

+ 5 Volt supply. 

Ground. 
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%M mamncrmM. 

t.t F^&Mr 

tk€ l^h>Ahm4 pipckm is into wo Idlva- tibc 
imf » teciicid •! tlw m$trmum m4 d*ti p«ti» mi ^ mmd, 
lociied •! the uddm* pew Eisch Wf ol ilic |wpclti>c (iHfur vi. 
mtpQt) hm • timnifmrttti toicli wlmh optnt« out ^isc with 
the mher, the tddrm Inch m dimug ihc irst htlf 

^ the cyck fcloclt Hl|» while tibc mpwi lalchcs (iustmciioo md 
dktM' «rt imrisptrcpt dunnf the iccwid Iwilf trf the cyek (clock 
LO) This compkmcrimry irriiifcmrnt tibws new mtmetiom 
10 be ckcodod Cm prrpijiitK^ for the foUowiiii cycle) while the 
uddrcM !«* the curreoi cyck u held stc»dy. 

IJ luitnictio® Port 

The mst-nictKW, poft irceim 7-tei msirucuws defmii^ the next 
c^rsiion to pcrfwtn fpwro microcode. Tlie A0SP-14O1 .has a 
biiil!“i«o Look-Ahoid iwpclioc Inch, chmmatmg need for an 
ottcroal mactooexk latch lo hc^d mstmctkms, This .irojtooanmiion, 
has the further benefit of tBowmf insirucijon *‘kx>k-ahcad”; the 
•cqucnccf tt able lo decode the next jasiructbn durmg atecutioo 
of the current cyck Dunn$ the **look -ahead” period, the sequencer 
prccaliuJatcs the next address, allowtng its mitput as cariy as 
potaibk HI the next cyck. 

External inxtructKiifif are internally latched during clock HI, and 
passed dsrecily lo the instruction decoder durmg clock LO 
flranspireni phased thus, unplcroenting the fint half of the 
locik-- Ahead pipeline latch. 

The uac of the nwinscwm hold mode (sre: InstructioD Set Dc- 
2 >7, and Initruction Hold Control, appendix 4.1) 
aJLI-ows an instrUiCtKm to be held in the insiructjion hndi lor 
execution ievcrml cycles (freeing microcode for use by other 

drm*‘Oi|. 

I.l Addrefti Port and Multipitxef Sourcoi 
TIk addrett pon providei 1 6* bit progmm addresses with ibroc- 
sijte driven designed for driving krge microcode .memories 
Addresses come from a four-io-one mkroprogram address muh 
riplexer Between the multiplexer and output port is a transparoit 
latch wbdb xs transfwrcnt during clock HI and latched during 
clock LO, permitting addresses to be output as early as possibk 
du,rmg one (clock HI) whUc holding the address coosiani 
during phase two (clock LO) - impkmencing the second half of 
the l>ook-,Aiic»d pipeline latch. 

Inputs 10 the ^microprt^ram address multiplexer arc the: 

• Ih-Bit Program Counter 

• 16-Bit Adder 

• Iniernipt Vector File and 

• Inteiml 64* Word RAM. 

AMmttni M(>dit 

TIkj ADSF‘1401 suf^ns two addressing modes: direct and 
indirect. The direct addressing mode usa the internal adder to 
generate either ibKilutc addresses from the dau port (without 
nnxljricition) or relative addroses from the prtigram counter 
(with or without extension: see Status Register, 1.4.4), The 
indirect addressing mode uses the lower o,rdcr bits at the data 
prxri to access the contents of intcmal RAM for output. 


Oifiptu JDnom 

pie »<idrcss pon output drivers arc always active unless placed 
in the high-impedtnee state by the IDLE instruction or appro- 
priately awning the TTR pin («» TTR Pin, 1.7). This allows 
other ckviccs to aupply microcode addresses, which is particularly 
useful in mulri-iasking or context switching applications where 
ADSP-140Is may be sharing coounon microcode 
memory. 


13.1 Frograin Counter 

The program counter (PC) consists of a 16-bit incrementing 
counter. For roost instructions, the PC is rocremenicd by the 
end of the cyck (posi-incrcmcni) as follows: 

PC <=output addrcss+ 1 . 

1.3.2 Adder and Width Control 

For absolute jumps, data from the data pon is passed unchanged 
through the adder directly to the microprogram address pon. 
For relative jumps, a twos complement offet is supplied from 
the data port and added with the 16-bii PC S»cc the PC nonnaliy 
points to the next instruction, the jump distance is (offset + 1 ) 
from the jump instruction. See Sums Register (1.4.4) for more 
details. 

The width control block permits microcode width to be reduced 
in systems not requiring full, 16-bii jump distances. Internal 
width cxMitroI logic sign-extends reduced cfi«s of 8 - and 12-bits 
ID full 16-bit precision, accommodating jump* « dtbex dircctioo 
(positive or negative displacement). 

1.3.3 IntcfTiiiMi Vector File 

Ten prioritixal interrupt vectors may be moeoi in the intemip: 
vector fik. The associated intcrrupis arc iwcmally la tch ed and 
may be individually masked or entirely d k a bkd by the “I>xsablc 
Inicrrupcs” (DISIR) instruction. The highest pnociry mterrop 
vector displaces the usual address tm the wcac cyde foBown^ cs 
dctccfkm. See hucnrupis (1.4.3) for more dewifc. 

13.4 latemal RAM 

Any of the 64 words of RAM may be oiitpwr ffic addroe 
port. Four distina address sources may acoe» the RA.M. 

• Local Stack Pointer 

• Global Suck Pointer 

• Subroutine Suck Pointer and 

• Lower Order Dau Pon Bits. 

The use of iulcmal RAM and its various .Jdress sources are 
described in section 1.4.2. 


1.4 Bidirectional DaU Port 

The 16-bit bidirectional data pon (D 15 - 0 I supple direct m 
ihdirect jump .ddresses md permits 1 ^ or dumping of all 
. , . , „ The innul data latch freezes mcommg dau 

internal registere^W^ 

£fii^"lJf^cMclockHI)«nd “ tnuisi«retit for ^ re^mder 

of the cy* e output dau driver asserts output dau^. 

01 me *-ycJc. 1 * r output mstrucuon and ts 

dt^g the -mia complemenur,. 

mdependent ®f be output from the sequeuccr 
1/0 arrangOT^ ^ 

during the second half-cy®- 
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|j.,l 

ini fWfii irw.k.^ them mmm% iwoi 

tw rfrtTfWnml «w pffhm^ed ilirwi|li Mimi^ 
miim ?wfi> llw* »ti?’ «•* ^ mcfiilf mwe4 

Ig^'Kif m III .4f* rt ifW'Ri . « dwivs 'Wf'fd «t i^ iiami 

ffgaut SmuJtMfWiwh , ilw w|f'- kt li iitii itt-iabk 

ifli'a»ifi'^l *»fwwi ifiiirvitK«i « Im i^mmmg ilic 

kwnJ mwffui^, IK^, rrwnrf cw*«!cf 

*irf li^iifwiiiw- Sft ^ 9»d lf»ifrn*.p^i, l..4Jj. 
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ftgure 2 Typicdf RAM inltiafization 

Rfiitkf Smk Pmam (LSP md GSP) 

U|» cnicriai a ro«tmc, up to four lump »ddrcs$« may be 
puibcd Mto 'tkt rqpstcr «ack. A Posh onto the itgistcr stack 
fini dccrcmcm the RS pointer (either LSP or GSP, depending 
upon the siatm rcgwier) and then writes the appropriate data to 
RAM A Pop from the register stack Erst reads the RAM location 
and then mcrdBacim the RS pointer (LSP or GSP). 

fmt rc|jnm arc anaabk within context of any routine which 
mrt iddfcsaed rclatm lo the stack pointer (LSP or GSP) by 
the mo LSBs of the rckruit mstraction. For example, the 

snitmetion; 

IFCOKOmOK, JMPR2 

mcme% the locatioo (LSP + 2 or GSP + 2) in RAM as the condi- 
Ifontl iddrc» aonrcc. Prior to exiting a routine, local or global 
rtf isters can be cfTectivcly removed from the RS by the “ADD i 
TO RSF’ (AIRSPi instmaion (see Instruction Set Description, 
2J). 

mten, the same set of jump addresses arc used by several different 
touiina The GSP is available for addressing these common 
r^isien — conserving RAM space and eliminating repeated 
stick pushes and pops. Global registers can be pushed, popped, 
and used by conditional msiructions in the same way that local 
re|i»tcft art handled. In addition, the GSP can itself be pushed 
and popped to.l'rom the subroutine stack, aUowing difTercni 
nmtitm to access different subsets of the global stack area. 

Sukmtmt Smk Potnuf (SSP) 

A Push onto the $S (jump subroutine or intermpt) first incrcmcnis 
the SSP and then writes the remm address to RAM. A pop 
frtun the SS first reads the return location and then decrements 
the SSPt effectively lemoving the data from the stack (although 
the data remains in RAM). For intemipts, the return address is 
the one that would have been output in the cycle when the 


M micmrom'i^ sypfmt empomf^rs 
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^tor crv».o) m iht cycle following the inicmipi request. 

Tlic c«tmii ioicmipts (IRi. j) miy be used for my purpose, 
unuied inputs rmi m be left flotting (i.c., tie them 
» lofic LO io m to preclude the WMocitted interrupt). Two 
»<^4itKmil inicmipt* whkh ire inteniil me reserved for suck 
®Wflbw — IR^ (la- Suck Limit Register md Suck Overflow, 
14.2) md counter un^rflow — IRq (sec Counters, 1 . 4 . 1 ). See 
Counters (1.4.1) for smpliciikms of using IRo for other than 
wuhk ctmtrol store downJoidmg. 

interrupt vcaors me tlwiys output (sssuinmg interrupts ire 
miblcd md the modsted interrupt is not masked) on the cycle 
tnsaiedistcly folibwing the acceptance of the inicmipi request. 
Owicstua! saves (sucking and storing) should be made im- 
n^iutcly upon entering the interrupt service routine and restored 
-®Mi»d:tatcJy prior lo its odt. 

tip to four external hbicttuixs may be conneaed directly to the 
meroal bicmifH pins, EXIR 4 . 1 , and arc treated as interrupts 
IRa^s, respectively. Lower priority intermpts, IR«_j, must be 
masked mi in tlsa ease. 

Up to eight external interrupts may be accommodated using 
imne-diviiion multipicxmg. An external 2:1 multiplexer reduces 
Ibc eight cxtcmal interrupts to two groups of four (see Figure 
%).. An internal dc-muJupkxcr automatically restores the external 
micrmpts back m cmbi. 

lie mtcrrupi vocior fik may be directly read and written via 
ih€ data bui with the aid of the Interrupt Vcaor Pointer (see 
Iniimcticm, Set Dwcr^pooc, Interrupts, 2.5). 



FfgureJ Expanding External Interrupts 

!M iMuk 

Inienup! requests TR^.j arc latched during the first half-cycle 
(clock HI), while IR 4. 1 arc latched during the second half-cycle 
fetock LO). Once latched, external intciTupt requests arc held 
snti! processed, even if the external request signal goes away. 
His latching technique allows removal of external interrupt 
nmrctt ifier they have been recognized by the sequencer. 

latched user interrupt requests (IRg.i) arc held 

is processed and a ‘‘Return from Interrupt (RTNIR) 
mstruciicm is executed; ii) the interrupt service routine executes 
i ^Tiair Current Intcnupt" instruction (tUowmg nested micr- 
mmih or, mS a “Clear AH Inierruptr instruction is executed. 
Reserved intermpts (IRv and IR.) are cleared from the intermpt 
latch by utilizing the SLRIVFand CLRS instructions, respectively. 
See Internal IR Control Logic (1.4.3) for details. 

Hr um my bypass &e inicrrupt latch with the “Select Trans- 
ptrrm Interrupts^* (STIR) instruction (setting status register bit 
SR^). In the transparent mode, the intcmipung device must 
the intermpt request until the intermpt service routine 
frsiCii the request source. 
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Ftgure 4. Internal Interrupt Control Logic 


m 

All icti iiitmiipu mav be indcpcodcnilv masked using status 
ncfisirr hn\ SR^ «, Ccorrcsptmdmg to interrupts Setting 

a p«nA^"iilar mask bit prevents the interrupt from being executed. 
Note ihai the status register may be read or written via the Data 
port jiisd sdw pushed and popped to/from the subroutine stack, 
alkm ing nesting and servicing of micrrupts in any desired order 
,9cc inirmai IR Control l-ogic, 1.4.3; and Status Register, 

I. ,4.4,, 

Twr msiruriions allow bitwise clearing or setting of the interrupt 
nusk ‘ IR Mask Bn Clear" (IRMBC; will clear those mask bits 
for whivh the eorTcsjxmding data bits as applied to 

IRv 0 are set, while "IR Mask Bit Set" (IRMBS) will set those 
mask hit^ ff»r whiwh the corrcspKmding data bits are set. In both 
cases, /rro^ in the data field will presen'e the corresponding 
mask bit See Instruction Set I>escription - Status Register, 2.3. 

IR Pmmn ikanifr 

rnmiiskfd mterrupti arc passed lo the priority decoder which 
driernunes the most urgent, valid mterrup: and generates an 
mtcrmil Interrupt Request Signal (IRS). The corresponding 
vcitof IS then fetthed from the interrupt vector file and passed 
to the address port 

Mmmtm IR Senvmg Reptfremtnts 

Intrnupi vectors ire output on the cycle following the acceptance 
of *r» interrupt request . Interrupt jumps differ Jrom subroutine 
rumps m that subroutine jumps push the return address in the 
tame cycle as the jump address is output, whereas interrupt 
mum iKidrcsses are not pushed until the foUmmg cycle. This is 


because the instruction executing while the inierrupt vcaor is 
output may be utilizing RAM and must compictc its cxecuiion 
prior to pushing the interrupt return address Thus, the PC 
(interrupt return address) is pushed au tow ta n ca lK in the first 
cycle of the interrupt service routine, i.c., tke cycle follov.'tng the 
inierrupt request acceptance. 

For this reason, the first instruction of any inicmipi service 
routine is always ignored; it must be a no-op (CONT). Note that 
a minimum interrupt service routine would be a CON'T followed 
by a RTNIR. 

Inlemal IR Control Logic 

The interrupt enable bit of the status register, SR^, must be set 
for interrupt servicing to occur. Intemipi servicing may be 
inhibited by clearing this bit, although external interrupt requests 
will continue to be latched. 

Only one interrupt is ever active at a time. Additional intemipts 
are “locked out" by an iniemal “Interrupt In Progress" signal 
(IRIP) during intemipt servicing (except for TRAP), although 
they continue to be latched. The IRIP signal is automatically 
reset upon the "Return from Interrupt” (RTNIR"' instruction 
which pops the return address from the subroutine stack lo the 

PC. 

Normally, multiple interrupts are accumulated in the inicmipi 
latch. Whenever a valid interrupt is pending, the internal signal 
“Interrupt Request” (IRQ) is assened. Upon each RTNIR, the 
highest priority, unmasked, pending interrupt is serviced. 
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KcimS tmrmpii irc lupponed with rwo imtmcuom; ‘tlkaK- 
Cunmt Irntfropt** (CCIR) or "Cktr All IiMcmipts** (CAIR). 
The CCIR mstryctkin clcin the IRIP tifntl iikJ inicmipt htch 
Ui for the inicmipf in profress. 'Th:h »ction re-cmblcs inter, 
nipting. rclef«tin| the imcrrupi in p'rofress to i tubroutine 
fUtus If m exiernitl mienupi is pcndini?. the issodtied IR 
¥ecior will be wtpui on the cyck followmg CCIR. To caned til 
pendmf inicmipi reboots, the CAIR innmaion clears the 
IRIP iifnil and the entire interrupt Inch. 

Nortmlh , it is gtmd practice to conven btcmipts to subroutines. 
This can be done by executing the Xkir Current Interrupt’* 
(CCIR instruction C reset img IRIP) and should be done as early 
IS possible in the mierrupt scrsicc routine. There are rwo rtasons 
for changing the status of in interrupt to that of a subroutine. 
Firstlv , if IRIP IS allowed to remain active throughout the interrupt 
serMcc routine* then the occurrence of either intcmal interrupt 
(stack overflow or counter underflow* IR, or IRo, rcspcctivdy) 
w^iU remam undetected until the current intcmipi concludes; 
the user will be unaware of these inicmipt requests. 

When using the TRAP capability (set TTR Pin, 1.7), there is a 
second rcawin to clear TRIP. Because TRAP must have the 
highcsi priority . interrupt (when invoked by a TRAP request) 
IS no: liKked out by IRIP. This allows TRAP to displace an 
interrupt in progress, but also means that upon completion of 
the trap service routine. IRIP will be cleared by the RTNIR 
instruction . re “enabling interrupting in spile of the bccwnpletc 
mterrupt whKh TRAP displaced, 

Either of these imtructions CCCIR or CAIR) require an ‘Vxtra” 
cycle before a pending interrupt vector may be output. A typical 
Ktrurio being «. interrupt in progress, IRn (containing a CCIR 
instruction , with a interrupt pending, IR|„: 
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1.4.4 Status Register 

The ADSP-I4C': has a Ib*bit status register for storing various 
opcrationa' modes The ten MS bits of this register (SR}^ .<,) 
comprise the interrupt mask for interrupts IRv-o» respectively. 
The remaining six !,S bin fSRs control the operational modes 
as show n below 


The tutus regiticr can be directly read and written via the dau 
pun and alto pushed and popped to/from the subroutine stack 
In iddition, lutui register bits SR,,., (the interrupt misk) may 
be bitwise cleared or set with dedicated instructions. See; In- 
atroction Set Description - Status Register, 2.3; and Interrupts 
- IR Mask, 1.^.3, ^ 

1.5 Dock 

The input dock employs both HI and LO levels to control the 
various transparent latches throughout the device. GcncraBy, 
the clock should be symmetric; however, in some instances the 
dock may be stretched during the second half-cycle (LO) to 
accommodate unusual circumstances such as a cache memory 
miss (see: TTR Pin - Trap, 1.7). 

1.6 External Flag 

The external flag input may be used to control cooditional in- 
structions. FLAG is latched similarly to instructions 'latched 
during dock HI and transparent during dock LO), but requires 
less setup lime. Two instructions make explicit use of FLAG as 
their condition (JPCOF and JPCNF), while others employ a 
condition mode selection (UNCONDITIONAL, NOT FLAG, 
FLAG, or SIGN; see Instruction Set Description, 2.0, to be 
specified as pan of their opcode. 

1.7 TTR Pin (Trap, Three-State and Reset) 

The Trap, Threc-Siaie and Reset pin (TTR) is a time-multiplexed, 
three-purpose pin used to 

• provide program trap capability 

• control the address pon output drivers and 

• reset the ADSP-1401. 

If the TTR pin is held HI for an entire cycle, the RESET sequence 
begins and TTR must be held HI for at least two more complete 
cycles (RESET requires three cycles to complete . If trap jmd 
three-state control capabilities arc also needed, the combmaoon 
of the 140rs inicrnal dreuits and the external circuitry d>owii 
in Figure 5 can be used to effectively lime-miiJtipkx the TTR 
pin. 




Sixrui Refixter Rtf 

itt# 


SR.. 

IR* Bu 

SK 


SR, . 

j 

Relxtive Jump Width Sclecoon: 

*00' « IS- bn reklivc xddrcu width 
*01'® 8-bit width 
*10' •* IHC Mode (8-bit width i 
•ir» 12-bit width 

SR* 

Select GSP/LSP 

SR. 

Enable Diwibk Inicmipts 

SR. 

SetClear Sign Bit 

SR, 

Sekcf TranspwifodLjiiched Inicrrupts 


Figure 5. External Logic for TTR Pin 


Trap 

For a trap to occur, the TTR pin must be assened during dock 
LO only. The primary reason to invoke a trap is in suppon of 
cache memory systems, or in case of system emergendcs. Cache 
memory' systems generally utilize a large microcode memory 
space, of which only a small area (that currently under execution^ 
is comprised of high-speed RAM (the balance consisting of 
slower, less costly memory). The high-speed RAM is directly 
accessible by the sequencer, whereas the bulk of (slow) memory 
is usually accessible indirectly (via a cache memory controller 
which controls downloads of code to the cache memory area). 
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Sequencer Sum. «fter RESET OpenlioD 


Rnramettr 


Reset Condition 


^^rosrim Counter 
Subroutine Suck Pomicr (SSP) 
iocil Suck Pointer (LSP) 
Olobtl SuckPointcr(GSP) 
Suck Uttit Register (SLR) 
RAjW Dtu 
Counicn 

lBicrTuptMisk(SR,5_^ 

ImcrrupiVeaorFilc 

Interrupt Veaor Pointer (IVP) 

SR 5.4 

SR, 

SR, 

SR, 

SRo 

Writable Control Store Mode 


liCode LoatioD 0000,6 

RAM Loation 00,0 

Undefiiicd 
Undefmed 
RAM Loation 32,0 
No Change 
NoChangc 

All Riu to *0* (Unmtsked) 

KoCSiange 

Undefined 

W (16-Bii Relative Offscu) 
•0’ (LSP Selected) 

*0’ (Inicmipts Disabled) 

*0* (Sign Bit Cleared) 

X)’ (Latched Interrupt Mode) 
Qeaicd 


NOTE; 

Tbe fijrei iftitniction (microcode location OOOOiO must be i “CONT”. 
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Jump and hranili m%tmtmm$ pftivA fkm cooto^ ol mkmmik 
cxei'uiiftn. offeririf ihrw way braocha. fumpi, tnWmbm adii» 
rrfurw. tod idcffnting Rtiair trlrciw face FigufT $l Tkac 


msinictioDS support conditional control* allowing addroxing 
ftom the register suck, the dau pon, or the indirect jump 
address space in the RAM. Generally, they art of the form: 

If Condition: Do Operation; Else, Continue. 

JPCOF . IF FLAG: JUMP PC 

The address is not incremented while the flag is at a logic HI, 
i.e., PC< «PC. If the flag is LO, the next address is (PC+ 1). 

IPCNF IFNOTFLAG:JUMPPC 

The address is not incremented while the flag is at a logic LO, 
t.e,, PC< "PC. If the flag is HI, the next address is (PC+ 1). 

IXWO IFCONDITION: JUMPPC+ 2 
If the condition specified is met, this instruction causes the next 
leejucnual microprogram address to be skipped. This instruction 
*lJm¥S ringlc instruction bypassing or inierlaving without need 
to provide explicit addressing. 

pyji IFCONDITION: JUMP DATA, ABSOLUTE 

If die specified condition is met, this instruction causa a jump 
10 the absolute address at the data pon. If the condition is not 

met the next sequential instruction will be executed. 
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JSE IF CONDITION. JIIMFSUBIOUTINE, 

l£LATi¥E 

If the aiwiitkin k the a<Idre»i ti the data pon it 

added to the Kl awl mismt {pmp dittaiKc k ^c^ict plut one) 
awl the KI i» pwthed onto iIk* uubrmtm itack. The offiei 
width ii dcterintfied hy the add™ wkJih acketkm (S, 12, m 
1 6. hi! » I If the cwdittoft it not met* the next tequemial initniction 
will be exccoicd 

ITN IF COHOITIOK • RETURN FROM 

SUBROUTINE 

Thit in%inictK»o it uu4 to rtiwrn fwn lubrouiines. If the condition 
tpccificd » met, the tuhrowtme tuck is POPped, which outputs 
the return address and decrements the SSP. If the condition is 
iwi met, the next »cc|ucniiil instruction will be executed. 

BRANCH IFSIGKOFC,JUMP,R.*C,<-C;- 1; 

ELSE*IFCONDITION: 

JUMPDATA,C;<-<;-l; 

El.SE*<:;< «C.- 1 (COND* SIGN) 

This msi ruction implements a three-way branch with the address 
source from the data port, rcfisicr R*, or the PC. The instruction 
first tests the sign bit of the counter C,; if negative* the output 
address is given by R** i.c., RSP+ i. If the sign was not true, 
but the specified conditicm is true* the address source is the data 
pon If the sign was not true and the condition is not met, the 
next sequential instruction is executed. 

The counter and the register use the same subscripl value i. 

The counter is decremented. Note that this instruction 

uses only abwdutc data addresses; relative addressing is not 
available with the three-way branch instruction. 

2.2 Stack Operations 
Suhrmnm Smrk 

.Subroutine Smh Pointer (SSP^ instructions are used for main- 
taining the subr^iuiine stack. These msiruciians may also be 
used tti upload or download the entire RAM for examination, 
nai.k expansion or context switches. 

fSDSS PUSH DATA ONTO SS 

Increments the stack pointer and then loads the RAM location 
specified by the SSP with the data at the data pon. 

f PSSD POP SS TO DATA PORT 

Transfers the contents of the stack location given by the stack 
fHunicr to the data port and decrements the stack pointer. 

WRSSP WRITE SSP 

I-oadi the SSP with bits . o of the data port. 

RDSSP READ SSP 

Read the bhii subroutine stack pointer. This allows the value of 
the stack pointer to be saved or examined. Bits D 5 „o of die data 
port correspond to bits 5-0 of the SSP. The 10 MSB’s of the 
data pon (Df^ ,<*) arc undefined. 

DSSP decrement SSP 

Decrements the stack pointer without reading. 

Hfgum $mck 

Register Stack Pointer (RSP) insiruciions arc used to upl<»d 
and download the entire RAM for miiializaiion, examination, or 


context iwitchini and to maintain the RAM ipice allocated to 
local Md global jump regiittn. Ai prcviouily diacuiicd* register 
itick iastructiom refer to either the Local Stack Pointer (LSP) 
or Globil Stack Pointer (GSP), depending upon the status 
regiiter (SRj). If SRj k LO, register stack instructions pertain 
to the LSP. If SRj is HI, register stack instructions pertain to 
die GSP. 

SGSP SELECTGSP 

Select the Global Register Stack Pointer. Set Stttus bit SR, 

(HI). 

SLSP SELECTLSP 

Select the Local Register Stack Pointer. Gear Sutus bit SR, 
(LO). 

RDRSP READ RSP 

Transfers the RSP to the data port bits Ds^o for examination oj 
storage. The 10 MSBs (Dis.^) of the D pon arc undefined. 

WRRSP WRITE RSP 

Preload the selected RSP (LSP or GSP) with bits D 5 _o of the 
dau pon. 

PSPC PUSH PC ONTO RS 

Decrements the RSP and writes the PC to the register stack. 

This instruction may be used to set up a JRC loop (IF 
CONDITION: JUMP R * PC). 

PSGSP PUSHGSPONTOSS 

Increment the SSP and write the GSP onto the subroutine 
stack. 

PPGSP POP GSP FROM SS 

Write the subroutine stack to the GSP and decrement the SSP. 

PSDRS PUSH DATA ONTO RS 

Decrement the RSP and then write the dau at the dau pon 
into the location specified by the updated RSP. 

PPRSD POPRSTODATAPORT 

Transfers RAM dau pointed to by the RSP to the dau pon and 

then increments the RSP. 

AIRSP ADD i TO RSP 

Add i to the register suck pointer. Note that i~ 0, 1, 2, or 3 in 
this instruction corresponds to 4, 1, 2, or 3, respectively. This 
ins truction effectively removes up to four registers from the 
suck. 

SIRSP SUBTRACT ONE FROM RSP 

Subtract 1 from the RSP without a write. This instruction is 

used to modify the RSP without explicitly reloading it. 

S4RSP SUBTRACT FOUR FROM RSP 

Subtract four from the RSP without a write. This instruction 

may be used to modify the RSP without cxpUciily reloading it. 

2.3 Sums Register Operations 

The sutus register bits, SRi 5 _oj contain ten mask bits, SRis.*, 
for masking intcmipts IR9-0, and six control bits, SR}.© (see 
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BidircctiM«l Dil* Fofi* L4). Hic entire ititut rtgiiicr cm be 
m written vm the «iiii pun* or puthed or popped to/froro 
tlir fobrouiiiie Hack. Vpm RESET, the enure mtui rtgiiter ii 
iiwiiidwcd to lero. 

EPSR read SR 

Tbe entire iteim rcf inter (SR, 5 , 0 ) i» output over the daii pon 

(Di", . 

WRSi WRITE SR \ 

Wnie iJk diti pon (D, 5 .o) to tbc iiarus regisicr (SR, 5 _o). 

PSSR PUSH SR ONTO SS 

Incrcroent iJic SSP And then write the status register to the 

lubrouunc stack. 

PPSE POP SR from SS 

The top of the subroutine stack is written into the status register, 
and then the SSP is decremented. 

2.4 Co'iwier 'Operations 

Ounten may be pushed and popped lo/from the subroutine 
stack or loaded directly from the data pon. The counters may 
be read cateraally by pushing the counters onto the subroutine 
stack then popping the subrouiinc stack to the data pon. The 
device has four counters, denoted C„ which arc indexed by the 
two LSBs of the instruction, 

If « jump is required afm N events (until sign), the counter 
should be loaded with two less than the number of events desired 
(N “ 2), U a jump is required for N events (while sign), the 
counter k loaded with 2*^ + N - 2« SOOO,«+ N- 2. 

Care must be taken when using the counter underflow intcmipt 
(IR^, sec 1 .4.3) to clou the rign bit btfort the IRq mask bit is 
cktred. 

34RCNTR W’RITEC; 

Write to the selected counter, Q, from the data port. 

CLES CLEAR SIGN BIT 

Clear status repster bit SR,. 

SETS SET SIGN BIT 

Set stitm register bit SR,. 

PSCNTR PUSHCOKTOSS 

Increment the SSP and write the specified counter onto the 

subroutine stack. 

FPCNTR POP C. FROM SS 

Transfer the diti from the subroutine stack 10 the counter 

specified by the instrueiion, then decrement the SSP. 

DCCNTE DECREMENTC, 

Unconditionail.y dccremcni counter C,. 

IFCDEC IF CONDITION: DECREMENTCo 

Decrement counter Q on condition. If the sign condition is 
selected, the sign is taken from status register bit SR,, rather 


than from the counter sign (which normally provides the sign 
condition). 

Normally, if the counter underflow interrupt (IR<,) is enabled, it 
is activated by the counter sign bit going HI. However, if IFCDEC 
is Used to decrement Co» the IRo interrupt is activated by the 
SR, bit, rather than the sign bit of C©- Since the SR, bit goes 
HI only after C© has underflowed, IFCDEC must be executed 
once more after the Co underflow to generate the IR© interrupt. 
Alternatively, the prcloadcd value of C© may be reduced by one. 

2.5 Interrupt Control 

Detailed interrupt operation is described in the Interrupts section 
(1.4.3). Here, specific interrupt operations such as intcrrupi 
clearing, IRV read/write, interrupt mask manipulation, etc., arc 
described. 

CCa CLEARCURRENTINTERRUPT 

Allows nesting of user inicmipis IRg_i on subsequent insimctions 
by clearing both the interrupt latch bit currently being serviced 
md the inicrrupi in progress signal (IRIP), re-enabling interrupts. 
If an external intcrrupi is pending, the associated IR vector 
will not be output until the cycle following CCIR. Intcmtl 
inicmipts (IR9 and IR©) arc not cleared by CCIR and must be 
explicitly cleared through the SLRIVP and CLRS instructions, 
respectively. 

CAIR CLEARALLINTERRUPTS 

Gears external interrupt latches IRj _ , , md re-enables the inicrrupi 
interface (IRIP cleared LO). The next sequential mstruciion 
will be executed prior to the jump to a pending inicrrupt. lntcn*al 
interrupts (IR 9 md IR©) arc not cleared by CAIR md must be 
explicitly cleared through the SLRIVP md CLRS instructions, 
respcaivcly. 

ETNIR RETURN FROM INTERRUPT 

Gcare the current interrupt latch for IR»«i, re-enables inicmipts 
(IRIP cleared LO), and pops the return address from the sub- 
routine slack. The next sequential instruction will be executed 
prior to the jump to a pending interrupt routine. Internal interrupts 
arc not clcar^ and the IR 9 and IR© intcmipt latches must be 
cleared explicitly through the SLRIVP and CLRS instructions, 
respcaivcly. 

RDIV READIRVANDINCREMENTIVP 

Outputs the interrupt vector currently pointed to by IVP to the 
data port and then increments the IVP. Interrupts should be 
disabled when writing or reading mtcmipi vectors. 

WRTV WRITE IRV AND INCREMENT IVP 

Writes the intcmipt vector currently pointed to by the IVP 
from the data port and then increments the IVP . Interrupts 
should be disabled when writing or reading inicrrupt vectors. 

IRMBC IR MASK BITWISE CLEAR 

Allows selected IR mask bits to be cleared. Data pon bits D, 5_6 
arc applied to status register bits SR15-6 (corresponding to 
mask bits for IR 9 . 0 ). Those data bits which are HI will clear 
the mask bit, while those data bits which arc LO will leave the 
mask bit intact. Data pon bits Dj-o arc ignored. 
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BidirvcticMMi Dtt« fm, 1.4). Tbc cfitirc iiitiit itgiitcr ctn .be 
rc*d m wriuen vi* ilic d»it |xin» or :pOfbcd or popped to/from 
llir tybfoyunc ilick. Vpm RESET, tJbe entire ftitui rcgiiier ti 

irwiuiitied to *crt>, 

WPSi READ SR 

Tbt CRtire fisiui reftiicr (SRis.,©) ii ooiput over the dtta port 

WISE WRITE SR \ 

Wnic lEc dtii port CD15.©) to .tEc iiifus register (SRis.©). 

,rSSl RUSH SR ONTO SS 

Incirroent the SSR and tl«n write the mius register to the 

lobroyiioc imck. 

rPSR FOR SR FROM SS 

The top of the subrouimc stack is wnticn into the stirus register, 
and then the SSR ti decremented. 

2.4 CO'tiater Operalioas 

Ounicrs mtv he pushed and popped to^from the subrotitmc 
stack or loaded directly from the dit« port. The counters may 
be read caicnmliy by pushing the counters onto the subroutine 
stack then popping the subroutine stack to the data pon. The 
device has four counitn, denoted C,, which art indexed by the 
two LSEi of the instruction. 

If 1 lump ti required «/irr H events (urnil sign), the counter 
should be loaded with two less than the number of events desired 
(K - 2|. If a lump is required /nr N events (while sign), the 
counter is kwded with 1*"^ ■+ N“ 2* 4000 1©+ N-* 2. 

Ore must be taken when using the counter underflow intcmipi 
(IR©, see I.4.B} to dear the aign bit ktfm the IR© .mask bit is 
cleared. 

WRChTTR WRITE Q 

Write 10 the adcctcd counter, C„ from the data pc«-t. 

CLRS CLEAR SIGN BIT 

Clear status register bit SR}. 

SETS SET SIGN BIT 

Set status register bit SRj. 

rSCNTR RUSH C, ONTO SS 

Incitmcm the SSR and write the specified counter onto the 

subroutine ita«,k, 

RRCNTR KWQVmmS 

1 rin%fcr the daft from the subroutine stack to the counter 
specified by the instmciion, then decrement the SSR. 

0CCNTR DECREMEKTC. 

UncondiiiO'naily decrement cO'Unicr 'C,- 

irC0EC IF CONDITION: DECREMENTCo 

Decremeni counicr Co on condition. If the sign condition is 
selected, the sign is taken from the status register bit SRi, rather 


than from the counter sign (which oormaUy provides the sign 
condition). 

Normally, if the counter underflow inicrnipi (IRo) is enabled, it 
» activated by the counter lign bit going HI. However, if IFCDEC 
ts used to decrement C©, the IR© inicmipt is activated by the 
SR, bit, rather than the tign bit of C©. Since the SR, bit goes 
HI only after C© has underflowed, IFCDEC must be executed 
once more after the C© underflow to generate the IR© interrupt. 
Alternatively, the prcloadcd value of C© may be reduced by one. 

2.5 Interrupt Control 

Detailed intcmipt operation is described in the Intcmipts section 
(1.4.3). Here, specific interrupt operations such as interrupt 
clearing, IRV rtad/writc, intcmipt mask manipulation, etc., arc 
described. 

Cem CLEARCURRENTINTERRUPT 

Allows nesting of user interrupts IR|_, on subsequent instmetions 
by clearing both the interrupt latch bit currently being serviced 
and the interrupt in progress signal (IRIP), re-enabling interrupts. 
If an external intcmipt is pending, the associated IR vector 
will not be output until the cycle following CCIR. Internal 
interrupts (IR9 and IR©) ire not cleared by CCIR and must be 
explicitly cleared through the SLRIVP and CLRS instructions, 
respectively. 

CAIR CLEAR ALL INTERRUPTS 

Clears external interrupt latches IRj _ and re-enables the intcmipt 
interface (IRIP cleared LO). The next sequential instruction 
will be executed prior to the jump to a pending interrupt. lDicn.al 
intcmipts (IR9 and IR©) arc not cleared by CAIR and must be 
explicitly cleared through the SLRIVP and CLRS instructions, 
respectively. 

RTNIR RETURN FROM INTERRUPT 

Qcars the current interrupt latch for IRs.j, rc-cnabics interrupts 
(IRIP dcaied LO), and pops the return address from the sub- 
routine stack. The next sequential instruction will be executed 
prior to the jump to a pending intcmipt routine. Internal intcmipts 
arc not cleared and the IR9 and IR© interrupt latches must be 
cleared explicitly through the SLRIVP and CLRS instructions, 
rcspcaivcly. 

RDIV READ IRV AND INCREMENT IVP 

Outputs the intcmipt vector currently pointed to by IVP to the 
data port and then increments the IVP. Interrupts should be 
disabled when writing or reading interrupt vectors. 

WRIV WRITE IRV AND INCREMENT IVP 

Writes the interrupt vector currently pointed to by the I\T 
from the data port and then increments the IVP. Interrupts 
should be disabled when writing or reading interrupt vectors. 

IRMBC irmaskbitwiseclear 

AUows $elected IR mask biis to be cleared. Data port bits D,,.. 
are applied to status register bits SR15-6 (corresponding to 
mask bits for IR,.o). Those data bits which are HI will clear 
the mask bit, wlide those data bits which are LO will leave the 
mask bit intact. Data port bits Ds-o art ignored. 
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biti 'for IRt.t). Tliow? ditJi Wii wbkli urt HI will let the muk 
bit, whlk tho*r bill which tit LO win .kavc the mttk Ut 
Iowa. I>iia port Wti tit ipioiwj, 

PfSIR DISARLE IhTTERRIirrS 

I>wblo the ocoitioo .tU ftwhei ktemipti by ckiribag the 
ewthk kicmipi (SEi). Exienul iMcmipu continue to be 
htched. 

i 

EMAHt ENABLE INTERROPTS 

Entbla ol miermpu by tctimi the entblc intcmipi 

'ilm (SRi). 

sunt SELECT LATCHED INTERRUPTS 

Pljctt the .micrmp! litchci m the litchcd m^odc for 

«tcmipt$ IR*. I (SRo LO). toloropts tre latched if they are 
'nlkf at the clock edge. Inicmipis IRt.5, are latched 

at the poaibw foini clock edge whik .IR4. i are latched at the 

mp^mt fowf clock edge. 

STm SELECT TRANSPARENT INTERRUPTS 

Places the kiicmipt reqocft latches in .the transparent mode 
(SR© HI) fcH- interrupts IRf„|. The intcrrupi request is <mly 
vmhd whdr the catciml interrupt inpuis are high. Interrupts are 
f till processed on the next cycle , to long as they meet the mmimum 
micrrup! aciup apcciEcation. Note lha.i Kicetini transparent 
mterrupung will dear any pending interrupts stored in the 
micnrupt latch. 

S'LEIW WRITE SLR WITH 

AND IVP WITH 0,j.n 

Loads the 4“h*f ttac:k limit register (SLR) and the 4-bit interrupt 
irector pointer (I VP) from the data port. This instfuciion also 
ckars the alack .overflow mtcirupt request IR©. 

Foe stack overflow’ detectbn» the .aedre 6-bii stick pointer 
(SSP* I.^P Of €$P) is compared to a S-bii word comprised of 
the 4-bit SLR (MSBs) and the two LSBs determined by the 
insmictkm typc» as follows: 

*00* fee tubroutme slick push (PSDSS); or, 

^1 r for rcgisicT stack push (PSDRS). 

For examine, if a stack limii of 36jo and posiiioning of the IVP 
at IRV7 » desired, the value ^OHIxxxxxxlOOlxx’ is provided at 
the data pwt . Note that the SLR and IVP cannot be read. 

The interrupt vector pointer (I\T} addresses the vector file for 
readini or writing interrupt vcciCMrs. To write intcmipi vectors 
IRVV„©, the IVP must first be kutialired by SLR! VP. The 
WRfV instruction (lec above) is then used to write the interrupt 
v«ior p<untcd to by the IVP, which is then incremented 
autonwtically. 

2 J Relative Address Width Conreols 
The width control initntetions tHow reduction of microcode 
when Jump Data Relative and Jump Subroutine Relative in- 
ttruaions need less than the full, Ib-Nt range. Use these in- 
structions to ii|n extend the S, 12 or 16-bit wide jump data 
presented at the data port. The jump width may be selected by 
the explicit instructions or by directly setting the sums register 
bits SR5.4 w described below. Any of these three insmictions 


h!!! 1 Hold CoBtrol mode (mc Mite. Inttnic- 

own - me, 2.7). 

Nwe lelec^o of S-bh width cm be made with or without 
IHt.. For lU rclttivc jumpi, the Jump dntiocc it the offiet + 1, 

WLW SELECT 16-BrrRELATIVE JUMPS 

SelM ^ 16-bii reUtive jump. Thii iddi D,j.o it the diu port 
to the PC to obtain the jump address. The sutus bits SR5 4 are 
set to *00’. 

tEL12 SELECT 12.BITRELATIVE JUMPS 

Selects the jump dau from D,,.©. The offset is sign-extended 
allowing relative jumps in the range +2047 to -2048. The 
sums bits SR5,4 are set to ‘IP. 

BJEU SELECT S-BIT RELATIVE JUMPS 

Selects the jump dau from 07^0- The offset is sign-extended 
allowing rdativc jumps in the range + 127 to - 12«. The sutus 
bits SR5_4 are set to *01’. 

2.7 MisceUaneous Instnictioiis 

CONT CONTINUE 

Increment and output the next location in microcode memory 
without iny other changes. Allows straight line microcode 
execution. 

IDLE DISABLE OUTPUTS AND JUMP PC 

Places the address port into the high-impedance sutc, inhibiting 
program counter (PC) increments. Useful in applications where 
multiple sequencers share t common microcode address bus. 

This instruction causes the ADSP-1401 to behave as if the clock 
had stopped. The IDLE instruction may be latched internally 
by using IHC, freeing microcode for use by another device. 

External interrupt requests must be inhibited during IDLE. If 
interrupts arc not inhibited, the ADSP-1401 will attempt to 
process an micrrupt that goes active. However, it will be unable 
to output an interrupt vector because the IDLE instruction 
places the address port in the high-impcdancc state; more im- 
portantly, it will set its IRIP flag, which will inhibit further 
interrupt processing even after Ac IDLE state is exited. 

Interrupts can be inhibited using Ae mtcmipt mask or Ac 
DISIR instruction. While inhibited, mtcmipt requests will still 
be latched in Ac interrupt latch. 

me ENABLE INSTRUCTION HOLD CONTROL 

Sets SR5.4 to TO’ and redefines Ac function of IRj to allow a 
subsequent instruction to be held for repeated execution, regardless 
of Ac instruction port. Use of Ac IHC mode requires that Ac 
mask bit for IRj be set. See Instruction Hold Control, appendix 
4.1 for more details. 

While in Ac IHC mode, asserting IRj HI (prior to Ac second 
half-cycle of any instruction) will hold that instruction and 
disable all intcmipts (al Aough Acy continue to be latched) 
unA IRi is brought LO again (again, prior to Ac second half-cycle 
of any instruction). 

It is recommended that IR, be dedicated to control of the IHC 
mode (if needed). However, if it must also be used for subsequent 
interrupting, then the CAIR instruction should be executed 
before unmasking IR, (to clear the interrupt request resultmg 
from use of IRi as the IHC control). 
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IJ SPEtlFlQTnONS 
Hill icctioii drscribci the ADSPd40rs 
Tk Spedkatipm Table liits the device 
•witcbiiii chamcidistks, while Figure 7 
P^mg liming jJiigram. 
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Figure 8. Three-State Reference Levels 
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WORD-SLICE'^* MICROCODED SYSTEM WITH ADSP-U10 


Qtnmht INFORMATION 

The ADSP*i4l0 ii • (m\, flctihk iddresi. gcDcntof optimised 
h»r digiiil tLgntl urn) and other high-pcrformtncc 

ctimpoicrt This, kw^jHwer CMOS device rtpidly icnerttcs the 
dfti memory tddrTsie^ nrqoired by routmn i«ch m digital 
filters, FFTi, operatmoi, and DMAs. With its 16-bit 

trihitfciurr, rtfn'-m. diial pons, tod s|*etd, this Word-Slice 
C 0 mp*fmcnt miprosts perfoimince and reduces board space 
fubsiiMiall) relative to bit-shce solutions. 

The Ar>SFO410‘s anhiieciurr Icaturci t 16-bit ALU, • co.m- 
parimr . and I6*bu repntn The re|jstefs .arc org-ani?ed mto 
four flics niicen address 00 refisters. tit offset (B) rcgkTcrs, 
four mmsmtt fC) regrtters, and four mititlixation (1) registers. 

The ADSP-1410 rapidly executes key address gencraling opera- 
tions In » single instmciKin cyck, the device can: 

• output a I6“bit memory’ address; 

• modify this memon address; 'tnd, 

• dciixt’ when the addras ¥a!ue has moved to or beyond a 
pre-sci boundary and condiiiontOy loop back to the 
top of a circular buffer. 

Consequently, circular buffers and modulo addressing for data 
riicmc^rks can be implemented without overhead. 

The ADSPT410's I0-bit microcode instructions mclude com- 
mands for losing, register rtad^rites, intcmiil data 
and k^ial'shift opcriimns, Instmctioni arc normally supplied 
from an external source. However, an mtemtl Alicrr^ic In- 
struction Kegisier CAIR)can F<^vide the instruction under external 
contrtil, allowing mmwipdc to be conserved in many 
applitations, 

»I«I W 0 fd'Sli«< a« lri4«»»r'b of Aoalo| Devieu*, Ii»c. 


The ADSP-1410 has a 16-bit address (Y) port for outputting 
addresses and a 16-bit data (D) port for I/O between internal 
and external registers. Also, an inicmaJ path allows external 
data 5 provided via the D port, to serve as an ALU source and/or 
to be directly output over the Y port for a DMA capability. 

Double-precision (30-bit), single-cycle addressing can be per- 
formed by cascading two ADSP-1410’s, with the MSB of each 
chip’s D and Y port dedicated to interchip commi^cation. 
Alicmaiively, a single AG can provide double-precision addresses 
at a rate of one per two clock cycles. 


The Look-Ahead pipeline eliminates the need for an external 
microcode pipeline register by inicmally latching instructions 
and addresses; microcode bits may be directly routed to the 
ADSP-1410 from microcode memory. Logically, the Lwk-Ahead 
pipeline is split into two halves: the first, located at the insiniction 
(and data) pon; and the second, located at the address pon. 

Each half of the pipeline (input vs. output) has a transparent 
latch which operates out of phase with the other: 
latch is transparent during the first half of the cycle (clock HI), 
while the input latches (instruction and data) are transparent 

during thesccondhalf of the cycle(clockLO).Thiscomplemcnta^ 

arrangement allows new instructions to be decoded (in preparauon 
for the following cycle) while the program address for the current 
cycle is held steady. 
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ci":S« Ltr"d“drsS\tS:i' '^’“1 of"' 

-«=S;ir 

However, many Sp' ?[* “““ -bis instruction, 
the device and speUed out in Mowi^^l"„7‘”'*"' ” 



ADDRESS SOURCES 

- Sixteen internal R registers 

- External data provided over the D pon 
OEiSFT SOURCES 

- Six micriul B registers 
“ l>jku Pon 

Of>'Sf=T OPI',RATIDNS 

- In*, fc merit 
■ I>Ck rrmeni 
“ AddOffvt 

- xSuhtrauOfKrt 
• Single Bn I cfi R,ghi 

xSh.fU 

CONDmONAI. RE.JNITIAUZATION 

- Independent Inhibit Enable for each of four 

miiiafiMijcm rcgisicrs^ 

•“ OmdiiiontJ AIR cxec’tjiion {u^aS for true 
modulo addressing ' 

OUTPUT 1tpi>ate SEQUENCE 

- Normal .Pre-Updaiei Mode louiput the address 

before update 

* I\»siT pdaie Mi>dc vOutput the address after 
up<Jate 

PRECISION 

- Single vhip supplies I6‘hif addresses 

- Isso chips cascaded provide SChbit addresses 

chip piiHidr*! Ml bit addresses in two 
esdes 


(Rn-#-- Rn-t* I ) 
Kd-BJ 


lANiEORyXORj 




PIN NAME 

^'is ~ Vq 


BI|5 - Do 


-at. «oojuo(Aitivrs 
DESCRiPTinv 

Readdress CYIouiputport. In single-chip doublc- 
prccKsion mode, the MSB (Y,,0 indicates whether 
the supplied address is the MS'*' or LSW (sec 
Precision Modes). In twcKchip double-prec sion 
nj^e, the MSB convevs the carry. shiff bit fr^m 
.hCsLms,S,g„ifica„t(LS)to,heMos.Signi^^^^^ 

The bi-dirccdonal data (D) port. In two-chip dou- 
Wc-precision addressing mode, the MSB CD,,) of 
this pon conveys CMP status from the partner 

Chip. 


^9 - lo The instruction port. 

CMP/Z A dual function pin. Looping instructions, which 
compare address register values to compare 
register values, assert this pin HI to convey 
CMP status if i) R>C for positive offsets, or 
n) R<C for negative offsets. Logical'Shifl in- 
structions assen this pin HI to convev the ZERO 
status of the result. 


DSEL 

Data Select control. Asserting this control HI 


causes data set up on the data pon to substitute 
for the R value specified in the instruction. 

AIR Enable 

Alternate Instruction Register control. Asserting 
this control HI causes the device to execute an 
instruction stored in the internal AIR, rather 
than the instruction set up on the instruction 


port. 

CLK 

Clock 

Vu. 

+ 5 Volt Power Supply 

GND 

Ground 


S-M MiCBOCODEB SUBmBT COMPONENTS 



16 


Oi*« 



INSTRUCTIONS 


Figure 7 - ADSP-14W Functional Block Diagram 
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AfcCHITECTURE 

After diRcussing the «rchiicciurc of the ADSP*14I0, different 
opcrtiing mode* of the ADSP*1410 arc detailed, followed by a 
description of the ADSP-HlOi method of operation: including 
timing concerns and instructions. Brief applications information 
is then presented, and the data sheet concludes with a section 
on MNEMONICS AND OPCODES. 

The ADSP-HlO's architecture (Figure 1) features four register 
flies, an ALU, a Comparator, an Alternate Instruction Register 
(AIR), and a Control register. External interfaces include a 10- 
bit instruction port, a 16- bit data (D) anci address (Y) pons, a 
DSEL (Data Select) control pin, an AIR Enable control pin, 
and a status Hag. 

Instruction Port 

The microcode controlling the ADSP-1410 is supplied over the 
ID-bit wide INSTRUCTION PORT. The instruction word, 

Is»o» is latched prior to the instruction decoder during phase one 
(clock HI) and is passed during phase two (clock LO). In addition 
to the microcode, two dedicated control pins affect the device’s 
operation: the DSEL pin (see Y Port, D Port, and DSEL Control 
Pin); and the AIR Enable pin (see Alternate Instruction Register 
and AIR Enable). These pins arc considered instruction bits, 
and latched as described above. 

Y Port, D Port, and DSEL Control Pin 
The ADSP-1410 has i\so 16-bit ports: a DATA (D) PORT and 
an ADDRESS (Y) PORT. The output drivers of both pons arc 
ihree-siaie disabled unless an instruction specifics an output. 

Addresses supplied to external data mcmor>’ arc output over the 
unidirectional Y port. The address supplied may come from one 
of three K^urccs: an internal address (R) register, the data (D) 
port, or the ALU. The DSEL (Data Select) pin controls whether 
an R register (DSEL LO) or external data (DSEL HI) is the 
address source. The address source can either be directly output 
over the Y port, or passed through the ALU for modification 
prior to output (sec Pre-Update Mode versus Post-Update Mode). 
Hardware three-state output control of the Y port is possible 
(see note in '‘Alternate Instruction Register and AIR Enable” 
section). Finally, the address being output (direct or modified 
source) may be bit-reversed (sec Bit Reverser). 

The Y port has two modes of operation (see Transparent Mode 
versus Latched Mode). In the more commonly used latched 
mode, addresses arc latched during phase two (clock LO). The 
transparent mode disables the output latch and may be used in 
conjunction with stopping the clock LO, allowing data to be 
passed through (directly, or modified by the ALU) the AG 
without performing updates. 

Any internal register may be read or written via the ADSP-1410’s 
D port. Also, external data can be supplied to the chip over this 
port for immediate addressing purposes. 

A’ofe; 

The ADSP-1410 may power-up driving the data bus. Caution 
should be used to avoid creating a bus contention with other 
devices which may be sharing this bus. To prevent bus contention, 
the CLK input may be forced LO during power-up (disabling 
the output data drivers). During this time, a RESET instruction 
should be setup at the instruction port to be executed as the 
first operation w’hen the clock starts up. 


Rtgisten 

The ADSP-1410 his 30 16-bii registers, organized into four 
banks. Single-cycle transfers between certain register banks arc 
supported. 

Sixteen ADDRESS (R) REGISTERS hold memory address 
pointers. In the same cycle that a 16-bit R value is output over 
the address (Y) port, it may be incremcnicd, decremented, 
offset, modified by a logical operation, or left/right shifted by 
one bit. The updated value is then written back into the original 
R location (pre-update mode). In post-update mode, the address 
is output after being modified. Any R value (or data, using 
DSEL) may be bit-reversed on output. 

Six OFFSET (B) REGISTERS furnish a second operand to the 
ALU (^c other, provided by an R register or the data bus) for 
modifying the address to be output. The B registers are partitioned 
into two, user-selectable (see Ontrol Register:’ B Bank Select) 
banks and external dau can substitute as an offset value whenever 
(bank one) or B; (bank two) is used (see Table IV). 

Four COMPARE (C) REGISTERS supply one source to the on- 
chip comparator, whose other source is the address being output. 
When an address moves to or beyond a boundary set by the C 
value, the CMP flag goes active (HI). 

Four INITIALIZATION (I) REGISTERS can-conditional on 
the CMP flag going active— overwrite any R value, allowing 
overhead-free branches to the top of an addressing loop. Note 
that I and C registers are always paired. Conditional re-initializing 
of R registers may be independently inhibited for individual I 
registers (see Control Register CR 3 _o). 

ALU and Shifter 

The ADSP-1410’s 16-BIT ALU performs adds, subtracts, and 
logical operations. Usually, one source is an offset (B) register, 
while the other is an address (R) register. However, external 
data provided via the JD port may substitute either for an R 
register (under the control of the DSEL pin), or a B register 
(using B3 or B7). 

For iwo-chip/doublc-prccision ALU opxrrauons, CARRIES into 
the MS chip and out of the LS chip (CS,n and CSout) conveyed 
via the Yi 5 pin (see Precision Modes). 

The ALU also contains the logic required for single-bit SHIFTS 
of a supplied R register. Left shifts are logical, while right shifts 
arc arithmetic. In iwo-chip/double-precision shift operations, the 
Yi 5 pin conveys the shifted bit. In single-precision operation, 
the carry/shifi status of the device cannot be monitored. 

The destination of an ALU or shift result is alw'ays the source R 
register location specified in the instruction — even if external 
data is the source. If the posi-update mode is used, the ALU/shifi 
result is sent directly over the address (Y) pon on the current 
cycle (in addition to being returned to the source R location). 

Alternate Instruction Repster and AIR Enable 
The ALTERNATE INSTRUCTION REGISTER (AIR) is a 
10-bii register which may be loaded with any instruction. On 
any cycle that the AIR Enable pin is asserted, the device will 
execute the instruction held in the AIR, rather than the instruction 
set up on the instruction pins (except for the RST instruction). 

The air’s principal purpose is to conserve microcode. One way 
to conserve microcode is to load a frcqucntly-used instruction 
(c.g., a looping instruction) into the AIR. Then, this instruction 
is executed simply by asserting the device’s AIR Enable pin — 
temporarily suspending the need for external microcode. 
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The AIR cm tlm conicrvc microcode in ipplictiions using 
multiple AGs (c.|., double-precision or high-ihroughput systems). 
If the AGs gcncrslly execute identical instructions* cxtcmil 
microcode m«y be significsntly reduced if they shtre a common 
microcode instruction field. During some cycles, however, it 
may be crucial for an AG to execute an instruction different 
from the common instruction— something which the AIR and 
its enable pm allow. For example, a NOP instruction can be 
l<»dcd into an AG’s AIR; anytime the AIR Enable pin is asserted, 
the AG will be selectively “put to sleep’' (VO pins three-state 
disabled; no change in internal state). 

The AIR register may be read over the data port (D^l^) in a 
single cycle. As Table I shows, the AIR may be written via the 
data port or the instruction port. If the instruction written 

into the AIR is provided via the instruction port, two cycles arc 
required. This method allows the AIRs of two or more AGs 
sharing microcode to be selectively loaded by differentially as- 
serting their DSEL pins. Note that if the DSEL pin is LO 
during the entire second phase (clock LO) of the LDA instruction, 
no AIR loading occurs. This implicitly requires that DSEL be 
setup accordingly prior to the sian of the LDA instruction, as it 
IS latched during phase one (clock HI). 


INSTRUCTION LOADED INTO THE AIR VIA THE: I 

DATA FORT 

INSTRUCTION PORT 

1. Execute “Write AIR” insir. 

1 . Execute “Load AIR” instr. 

2 . Provide instr. on instr. pon 
and assert DSEL pin. 


T&bie /. Options for Reading and Writing the AIR 


A second mcthcxl exists for executing the instruction in the 
AIR. Looping instructions compare an address (R) value to a 
compare (C) value and, if the address has moved to or beyond a 
pre-set boundary, the CMP flag goes HI. If CR)o (see Control 
Register and Conditional AIR Execute Mode) is set, a true 
comparison causes the device to execute its next instruction 
from the AIR (sec Table III .) This capability facilitates no-overhead 
modulo addressing (sec application note: Modulo Addressing). 

Note: 

The AIRE pm may be used to control the Y pon output drivers 
by loading a NOP into the AIR register; the AIRE pin becomes 
dedicated to three-state control of the Y port. This technique 
supports connection of multiple address sources to the same 
bus. 

Flags and Comparator 

The ADSP-14V0 has two internal flags— CMP and ZERO— that 
share the external CMP/Z pin. The CMP flag, set by the com- 
parator, is affected by looping instructions. The ZERO flag is 
set whenever a Logical Shift instruction has a zero result. In 
cycles that do not affect the CMP or ZERO flag, the CMP/Z 
flag pin defaults LO. 

As Table 11 shows, the CMP flag goes HI whenever the supplied 
address moves to or beyond a boundary’ set by the specified C 
register. The address that is compared to the C value is always 
the address that is output— even in post-update mode. R, C, 
and B values arc treated as unsigned integers by the 
Comparator. 


Ttvos-Complment Offuts 

Negative offacts arc generally handled by the R--h-R~ fi in- 
•tniction. However, if for aoroc reason the user it interpreting 
offset values as negative twos complement numbers, the instruction 
R*^ — R + B will cause the companior to kmc whether RfeC 
(when the condition R<C is of interest). The user may account 
for this reversal (e.g., by monitoring for the CMP flag going 
LO, rather thin HI), but looping instructions cannot be fully uii- 
Uzed. 


ARITHMETIC OPERATION 

CMPFLAGHIGHIF: 

R, R. 1 (YINC instruction) 

R, R* - 1 (YDEC instruction) 

R. *^R, + (YADD instruction) 
R, R« ” (YSUB instruction) 

R.a:q 1 

R.sq 

R.aq 

R.SC, 


Table tl. CMP Flag Truth Table 


Alternating Offsets 

If the microprogram switches between different offsets and the 
AG is in the normal, pre-update mode, the comparator logic 
may produce seemingly erroneous results because comparisons 
arc not made until the cycle following the update. In pre-update 
mode, when a routine switches between positive and negative 
offsets, the comparator will check for wrong condition because 
the comparison is not made until the foUow^ing cycle. The value 
in the compare register must anticipate the comparator Knsc 
reversal by one cycle. 

Bit Reverser 

Addresses can be bit-reversed as they arc output, which is useful 
in algorithms such as the Fast Fourier Transform. The bit-reverse 
mapping is as follow’s, where K, and Yj denote the bit of K 
(either an address register or the data bus) and Y (the address 
port), respectively. 

Ko-^Y,5 

Ki 


K,5-^Yo 

Bit reversal only affects the value that appears on the address 
port; it docs not affect the value returned to the R register 
location. The hardware bit reverser operates only on single-pre- 
cision, 16-bit addresses. For details on software irv’crsal of N- 
bii (N<16) fields, see the application note: Variable-Width Bit 
Reversing. 

Control Register 

The ADSP-1410's 11-bii CONTROL REGISTER (CRia-o'l niay 
be read or written via the device’s data port, Dkmj- Dedicated 
instructions are used to read or write the entire control register, 
or to set and clear individual bits (see Instruction Group 4). On 
power-up, the RST instruction clears the control register to all 
zeros automatically. 
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The followmi U%i 9hom% th€ cof>trt>l rtgiittf If the 

bii(i) i# trt (HI), iJbc tprcificd mode k c^ntivt. 


" CR 

kii 

1-0 

MmmhMonm Mmk: For looping mitruciions, enables^ 
coodutonal rc*initiahzation of R rtgiatcr* with 1 irgistcrt 
FtTf example, ieftmg bit 2 of the CR allows to re-in- 
itializc the aclfcted R register if the address has moved 
to or beyond Ihe btiundtry act bv Cr 

5-4 

Prtmum Srka 

00^ single* precision mode, 

01 ^ double -precision mode, LS chip; 

10 ' double- precision mode, MS chip, 

1 1 douWc-prccision mode, ungk-chip. 

6 

Tramparmt Mmir Sets the address (Y) pon to the 
transparent mode otherwise, the Y port is latched 
during phase two. 

7 

R Bank Stkii, Selects the upper c,ight R registers is 
address sources for the YADD luid, YSUB instructions. 

S 

B Bank Sekri Selects the upper four B registers as 

offset sources for all msiructions. 

9 

Pmt4 update Modr. Sets the post -update mode (addresses 
supplied after updating'. 

'lO 

CmJinmai A!R Bxtiutr, Sets the conditional AIR • 
iTUHle allowing looping mstructioni to (condiiional 
upon the CMP status fomg true) be fetched from the 
AIR on the next instruction, rather than the instniciion 
port Using thi% mode disables conditional rc-iniiializa- 
iion (of R bv I on CMP| and form the default update 
of R 


ADSr-Wro OFEiATIMG MODES 

The f1:e'Xihiiity t.hc ADSF^HIO ii cnhswK'cd by Kvcrtl optional 
©f o^pcfiiKW.. Tbn« model*, governed'' by the control 
ref liter, ire divcyticd i« detail m thii section. 

.Ewcisiofi M«4ei 

Typically, I'he ADSP»HI0 provides smglc-prccision (l^bii) 

»ddf esses. If I rcaier addressing range is needed, double-precision 
(J0«bit) addresses can be supplied. Two double-precision modes 
are supfHirtcd-«-one with two chips cascaded and the other with 
a single chip Specific instructions set these inodes. Double-pre- 
cision (Single* or two-chip t bii^revcrsing and is not supponed. 

Tvx>4^htp r^mhk’Precmm 

(CM^ 4 *^ for I.S chip, "TO" for MS chip). In this mode, 
two ADSF4410'5 are cauaded to generate double-precision 
addresses at a rate of one p« cycle. Each address may be output, 
tnt.rcmenicd, decremented or mridified by an offset value, com- 
p«cd t<* a double "precision value* and conditionally rc-inhialized 
by i double prc«.i%mn word Alternately, double-precision logical/ 
shift oiKTaimnc mav he performed. 

The Y and D ports of each chip are restricted to the lower 15 
bits, freeing the MSHc of btuh devices to convey carry.'shifi and 
(IMP status, respcciiveh isce Figure 2), For double-precision 
adds subtracts, the IS chip sends carrytborrow status over the 
Yp, pm, the MS chip uses Yi% to accept carry.%orrow status 
from the LS For left (right) shifts, the LS (MS) conveys the 
shifted bn over the pin. 

Double*prccision, conditkma! re-initializations are implemented 
by dediciting the pin on each device’s data port to receive 
the CMP status from the other. When performing a looping 


m»irucuon, the MS chip gcnerites ■ v»iid CMP fUg on iu CMP/ 
2 out|mt_ For , logic.) or .hifi in«niction, the CMP/2 output, 

•ingle vihd ZERO n.g. T« enjure th,i thii flig b valid on the 
next low-i^Mgh transition of the clock, the output of the AND 
gtle ihould be Inched is ihown in Figure 2. The ZERO fUi is 
latched on the falling edge of the clock and held by the Itich 
until the next falling edge. 
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P/gure 2. Valid Two-Chip Double-Precision ZERO 
(Logical instructions) 

In this mode, all values arc 15-bit words. The 30-bit address is 
supplied in two 15-bit words over the Y, 4 _o pins of the two 
devices. Internally, the MS bit of each operand is zeroed prior 
to ALU operations, the MSB of the result then becoming the 
carry/shifi bit. External data provided over the D pon must be 
segmented with the 15 LSBs going to the LS chip auid the 15 
MSBs to the MS chip. 

In two-chip/double-prccision mode, both chips may share the 
same microcode instruction. The only complicadon to this sharing 
is in differentially initializing the MS and LS chips. Internal 
logic allows this initializadon to be accomplished. Both chips arc 
fed the insirucdon designating it as the LS chip. The isscrdon 
of DSEL on the intended MS chip during the SETP instruedon 
reverses the two LS instruedon bits (those defining the chip 
configuradon to the control register), allowing both MS and LS 
dcsignadons to be performed simultaneously. 

Single-ChipiDouhle-Predsim 

(CR 5.4 = ‘Tr*). In this mode, double-precision (30-bit) addresses 
are generated at a rate of one every two cycles. Each address 
may be output, incremented or decremented, and compared to a 
double-precision compare (C) value. Logical/shift operadons arc 
also supported. CondidonaJ rc-inidalization with I rasters and 
the condidonal AIR mode are not supponed. 

LSW operadons are executed first, followed by MSW operadons 
(with the cxccpuon of right shifts). Even-numbered R registers 
arc reserved for LSWs, while odd registers arc assumed to be 
MSWs. No such restriedons apply to B or C registers; MS or 
LS words may be held in any B or C register, but such aHocadon 
must be tracked by the user. After an operadon involving LSW 
registers, the device stores the carry/shift bit (as appropriate) 
needed to complete the double-precision operadon. On the next 
operadon involving MSW registers, this intennediate value is 
udlized. Storage of the carry/shift bit occurs only on LSW oper- 
ations, except for double-precision right shifting, which starts 
with the MSW. If non-addressing operadons intervene, the 
intermediate value is not disturbed. The comparator will generate 
a meaningful CMP signal after each MSW^ operadon. 

In this mode, only the 15 LSBs of any register arc used. The 
LSW and MSW addresses that arc supplied arc both 15-bit 
words. The Y 15 (MSB) pin of the 16-bit address port designates 
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whether the address is the LSW ( » Oj or MSW ( I), arjd may 
hc wsed 10 control art extcrnil mux, Note that the MSB of values 
provided via the data CD i port is not meaningful in this mode. 

Tranaparent Mode 

(CRf, HI I In ihjs mode, the address jxjrt is made transparent 
during the entire cycle > rather than only phase one. The transparent 
mode may also be used in conjunction with stopping the clock 
iLOh m which ease the enure device behaves asynchronously 
and no updates are wruten internally. 

Ijiichcd Mode 

tCRf. LO/ In latched mode, output values arc enabled during 
phase one and latched at the address (Y) port during phase two. 

Use of the latched mode guarantees that outputs remain stable 
throughout the current cycle regardless of changes at the in- 
strucuon port. This, in contrast to the transparent mode, in 
which such changes may occur quickly enough to alter the 
output before cycle end. ^ 

fost^Update Mode 

(CRv HI Addresses are output after the update operation. The 
dclav berween the start of phase one and output of a valid address 
IS extended in this mode to allow for updating. The addresses 
output are equivalent to the values written back into the specified 
address (R) register In this mode, extemaJ data may be brought 
on chip, modified and output — in a single clock cycle. 

Pre Updatc Mode 

tCR<> LO l. This is the normal update mode in which addresses 
ire output over the address (Y) pKirt prior to ui>date operations 
(increment, decrement, offset, shift, and logical)— allowing 
addresses to be generated at maximal speed. Note however, that 
this mode requires two cycles to bring external data on chip, 
mtidify It, and supply it as an address. 

Conditional AIR Execute Mode 

(CRio HI ), In this mode, a valid CMP flag on looping instructions 
causes the next instruction to be executed from the AIR. The 
MODULO ADDRESSING section highlights a particularly 
valuable use of this mode. 

Note that conditional re-mitializaiion of address registers is 
disabled when using the conditional AIR execute mode. The 
default (ELSE clause) is performed unconditionally whether or 
not the instruction is from the insiruciion port or the AIR). 

(CRio LO). Conditional AIR execution is disabled. Conditional 
re-miiiilization is fully operational, contingent upon the re-in- 
itialization mask (CR|,.o)- 

Table in summarizes the different ways the CMP status affects 
operation of the AG as a function of the conditional AIR execute 
mode control bit, CRjo^ and the reinitialization mask, CR 3 _o. 


D15 

MS 1410 

Y15 CMP.Z 


D15 

LS1410 

CMP/Z Y15 


y—zu 

1 



Ffgure 3. Two-Chip 'Double Precision Handshaking 


CMP 

STATUS 

CR,«LO 

CR„HI 

CR^LO 

ex HI 

LO 

No Effect 

No Effect 

No Effect 

HI 

CMP/Z goes 

HI 

CMP/Z goes 
HI; 

CMP/Z goes 

HI; 

Next instr. 
executed from 
AIR 


Table III. Effect of Compare (CMP) Status for Looping 
Instructions; Note: j^3-0. the Re-Initialization Mask. 


INSTRUCTION SET DESCRIPTION 

The ADSP-1410’s instruction set is paniiioncd into six groups, 
which arc discussed below. First, however, issues spanning 
several insimciion groups arc discussed. 

Most of the instruction groups contain instructions using one of 
the chip’s six offset (B) registers. Without exception, these 
instructions have just two bits available for selecting the B register. 
Consequently, offset registers are partitioned into two banks. 
The uppcr/lowcr bank selection is maintained in the control 
register (CRg) and is set or cleared by dedicated instructions. 
Whenever the “fourth” B register of either bank is specified 
(B 3 or B 7 ), the ALU’s offset source becomes external data (sec 
Table IV). 


CRg& TWO-BIT 
OFFSET (B) 
REGISTER 
FIELD 

OFFSET 

SOURCE 

000 

BO 

0 01 

B1 

0 10 

B2 

xll 

Data Port* 

100 

B4 

101 

B5 

1 10 

B6 

xll 

Data Port* 


Table IV. Offset Value Structure 

♦Explicit use of DSEL is unnecessary when using Bj or B; offsets; the offset 
data is sourced from the data bus by default. 

In several instruction groups (see mnemonics and opcodes for 
details), address (R) registers are used. In all eases, asserting the 
DSEL pin allow’s external data to be substituted for an R value 
as both output and update data. 
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Two iftsiniclion groups (looping and logical/ihifi) both supply 
and update the address Normally, addrctics arc supplied prior 
to updaung (pre-update). In poit-updaic mode however, the 
addresses arc output after the update opcniion is performed. 
CR9 controls this mode of operation. 

For all instructions accessing an offset register, the MS bit of 
the three-bit offset register address (B, of Bbb) it fetched from 
the control register and is programmed by the SELB instruction. 
This is also the case for the YADD and YSUB instructions 
(group 1 ) as pertains the MS bit of the four-bit address register 
address (R, of Rrrf ), programmed by the SELR instruction. In 
both eases, it is incumbent upon the programmer to ensure the 
appropriate register bank is selected. 

The Y port is only driven on output instructions (mnemonic 
form Yxxx, see MNEMONICS AND OPCODES). Otherwise, 
the Y port defaults to a high-impcdance state. 

Instructjon Group 1: Looping 

Instructions in the looping group supply the contents of a selected 
address (R) register to the address (Y) pon and then overwrite 
the R location with an updated value. 

All instructions in this group generate an internal CMP status 
indicating whether the supplied address has moved to or beyond 
the boundary specified by the compare register. This sutus may 
be monitored externally via the CMP/Z pin. Internal to the 
chip, the CMP status can i) be ignored, ii) be used to control 
re- initialization of the R register value with a selected I register 
value (e.g., to rcsian an addressing loop), or iii) control execution 
of an instruction located in the AIR on the next cycle. Individual 
control register bits determine which option is enforced (see 
Control Register). 

YINC Output St Incrcment/Inii. 

Pre-Update Mode: Y-^ R.; 

IF(R„^C,): 

THEN R.-^I„ 

ELSE R,-<~-R.+ l. 

Post-Update Mode: Y -a- R. + 1 ; 

IF(Y^C,): 

THEN 

ELSE R,-a— R.-hL 

Output an address (R) register on the address (Y) port and 
compare it to one of the compare (C) registers. If the address is 
Jess than Q, the R location is simply updated with an incremented 
value. However, if R«s:Cj , CMP status goes HI and the R 
register is re-initialized with the Ij value, provided the initialization 
mask (CRj.o) enabled for Ij. Note that other modes of operation 
lUow CMP status to be ignored (e.g., the instruction executed is 
simply “Y Ro; R„-^ + I’O or to cause the AIR instruction 

to execute on the next cycle. 

YOEC Output Si Dccrcmcot/Init. 

Pre-Update Mode: Y — R,; 

IF(R,:SC,): 

THEN R„-*~ I,, 

ELSE R,-i-R«-I. 

Post-Update Mode: Y -4— R,-l; 

IF(ysC,): 

THEN R»^Ii, 

ELSE R.-^-R.-L 

Same as above except the R value is decremented instead of 

meremented; CMP is valid if the R value is less than or equal to 
the C value. 


15 ^ 

YADD Output & Add OffaetTnit. 

Pre-Update Mode: Y-#— R,; 

IF(R.fcCj): 

THEN R,^I„ 

ELSE R, •4-- R^ 4 . 

Poit-UpdatcModc: Y + 

IF(YiC,): 

THEN . R.^1,. 

ELSE 

Same as YINC except the R value is summed with the contents 
of a selected offset (B) register. 

The R register bank select bit (CR7) is used in both the YADD 
and YSUB (offset) instructions. 

YSUB Output Si Subtract Offsct/Inii. 

Pre-Update Mode: Y '♦—R,; 

IF(R.sC,): 

THEN R,-^Ij, 

ELSE R.^R,-B„. 

Post-Update Mode: Y -4 — R,-B„; 

IF(Y:sC,): i 

THEN R.^Ij, 

ELSE R.-^R.-B.. 

Same as YADD except the selected offset (B) register is subtracted 
from the R value. 

Instruction Group 2: Register Transfers 
Instructions in the register transfer group suppon internal register 
transfers, as well as transfers between internal and cxicmal 
registers. Internally, any I or B register may be written directly 
to any R register. Also, any R register may simultaneously be 
output and written directly to a B or C register. For an R-to-R 
transfer, the source R register can first be written to a B register, 
followed by a write of the B register to an R register on the next 
cycle. 

Iniemal registers arc read or written externally via the bi-directional 
data pon. There arc explicit instructions to read any of these 
registen; however, only the I registers have an explicit Write 
instruciioD. The R, B, and C registers may be written with 
external dau by executing a transfer instruction (YRTR, YRTB, 
and YRTC) and asserting the DSEL pin, substituting the external 
data for the designated R value. 

YRTR Output & Transfer Addr. Reg. to Self 

Y^R. 

Outputs selected address (R) register over the address (Y) pon. 
When DSEL is assened, data pon values arc output and, in the 
same cycle, written into the selected R register. 

YRTB Output & Transfer Addr. Reg. to Base Reg. 

Outputs selected R register over the Y pon and copies it into a 
selected B register. When DSEL is assened, data pon values 
arc output and, in the same cycle, written into the selected B 
register. 

YRTC Output Si Transfer Addr. Reg. to Comp. Reg. 

Y-t-R.;Cj 

Same as above, except that values arc written to a C register. 
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pH T mmfcr Dili But to Im . Eeg . 

I, 

t«idi* iclecfed I rcibtcr fro^ diti (D) port, 

ITR T riRsfcr Init . Reg . to Addr . Reg , 

R^^I, 

Selcticd R rcfi^ter i% loidcd from m I register, illowing i 
microprogram to restart a loop it any time. 

iin T ransfer Base Reg . to Addr . Reg . 

Loads an R refisicr from a B register. Ort^ce in "the R register, 
the B tjiliit nwy be modified and then returned to the B file 
(using a ¥RTB mstruction). Recall, use of Bj or B? will .access 
ibc dm port as the offset source, allowing R registers to be 
iniiiiliJed dirrciiy from the dm port . 

ITD T ransfer Addr Reg , to Data Bus 

D-^R« 

Supplies seiecfed R regtster to ^data (D) pon. 

CTD T nnsfer C^kimp ^ Reg . to Data Bus 

D^C, 

Supplies selec't'ed C register to -dm ( Dj .pon . 

BTD T ran tfer Base Reg . to Data Bus 

D 

Suppliei sel«ied B regiiter to data (0) ^pon. 

ITD Transfer I nit . Reg , to I^ta Bus 

D 1, 

Supplies selcc'ted I register to ^data (D) pon. 

Instruction Group 3: laical St Shijft 
Instructions in the logical shift group supply a value from a 
Klectcd address (R) roister to the address (Y) pon and then 
unconditionally ovcrwriic the selected R location with a modified 
version of the output. Modify operations include logical (AND, 
OR, and XOR) and shift (one-bit kfLrigh\) operations. AU 
instructions in this poup affect the ZERO flag, which goes HI 
if the result of the modification is aero. The ZERO flag sums is 
available extermlly over the CMF/2 pin. 

Y0R Output A I,x>|*cal OR to Addr. Reg. 

y-*-R,;R„-^CR«ORB«) 

Selected R regntef is supplied to the address (Y) pon; the specified 
R IiKaiiim IS then overwritten with the logical OR of the B 
register and original R value. 

YAK0 Output A Ixigical AND to Addr. Reg. 

Y-4.-»R,,R,^ (R^AKDBJ 

Same as above, except lh.it .§ logical AND is perfo-rmed. 

YX0R Output A I-ogical XOR to Addr. Reg. 

Y-i-R,;R^-^CR«XORBJ 

S'tme as above, except that a logical XOR is performed. 


YASR Output & Arithmetic Right Shift to Addr. Reg . 

V-*-R.lR.^-ASR(R.) 

Wecied R fitter i> lupplicd to the iddrcu (Y) port; the specified 
R ^Uon u then overwritten with the original R value ■rithmeti- 
cally shifted tight (ASR) by one bit (the MSB is repeated). 

VLSL OttpuiALogicalLeft ShifttoAddr.Reg- 
y-»-R.;R.-^lsl(r.) 

Mected R register is supplied to the address (Y) port; the specified 
R location is then overwritten with the original R value logically 
ihified left (LSL) by one bit (the LSB is zcro-filled). 

Instnictioo Group 4: Control Regittcr 
Instruc^ns in the control register group reset, retd, and write 
the entire control register or individual control register bits (set 
Control Register). 

Note the use of and “pp” to denote values supplied within 
the opcode field (sec MNEMONICS AND OPCODES). A 
positive logic convention is used throughout. 

RST Reset Control Reg. 

CR-^O 

Clean the entire control register (CRio-o)- The RST instruction 
has dedicated decoding logic so that it takes precedence even 
over the second instruction of a conditional AIR sequence. 

DTCR T ransfer Dau Bus to Control Reg. 

CR-#-D 

Writes the entire control register (CRio-o) from the dau port, 

0JO-O* 

CRTD Transfer Control Reg. to Dau Bus 

D-<-CR 

Outputs the entire control register (CRio-o) ever the dau port, 
010 - 0 - 

SETI Sci/CIcar Conditional Init . on CMP Flag 

CR^i^x 

Enables conditional re-initialization of an R location, subject to 
CMP sums (sec Control Register). This instruction loads the x 
value into the control register bit spiecified by jj. Conditional re- 
initialization of address registers by the Cjj/Ijj pair is inhibited if 
the corresponding CRjj is cleared. 

SETP Set Chip precision 

CR5-4^PP 

Loads a 2-bit code (pp) into control rejgister bits 5 and 4, specifying 
the addressing mode of the device: 

00 = single-precision mode; 

01 == double-precision mode, LS chip (10 if DSEL.)9 
10 -double-precision mode, MS chip; 

1 1 = double-precision mode, single-chip. 

If the instruction “SETP, OP’ is supplied and the MS chip’s _ 
DSEL pin is asserted, the CR 5_4 bits arc reversed, i.c- , the MS 
chip is loaded with “10”, not “01” (sec Precision Modes). This 
is useful if the MS and LS chips share a common instruction 
bus. 


MJCROCODED SUPPORT COMPONBMTS 3-33 




tETY SclY Port toTrintpareni/Litched Mode 
CR, ■♦-X 

Ute. the LS initruction bil to Kt the addms (Y) port to the 
^•parent (HI) or latched (LO) mode. Thi. autut it maintained 
in control register bit 6 . 

$£LR Select U pper/Lower Addr . Reg . Bank 

The LS bit of this instruction provides the missing Address (R) 
register select hit required by the YADD and YSUB instructions. 
This selection is maintained in control register bit 7. 

SELB Select Uppcr/Lowcr Base Reg . Bank 

CR,^x 

The LS bit of this instruction provides the missing B register 
select bii^requircd by all instructions utilizing offset (B) registers. 
This selection is maintained in control register bit 8 . 

SETU Set Update Mode (Post/Pre) 

CR^-^x 

Setting this bit causes the chip to output address values after 
updating them (post -update mode). The LS bit of this instruction 
determines the value of control register bit 9 . 

SETA Set/Qcar Conditional AIR Execute Mode 

CRio-^ X 

Setting this bit causes Looping instructions — conditional on 
CMP status being HI — to execute the following instruction from 
the AIR on the next cycle. In this mode, conditional re-initialization 
of R by I on CMP is inhibited. The LS bil of this instruction 
determines the value of control register bil 10 . 

Instruction Group 5: AIR Control 
Instructions in the AIR group write and read the Aiicmaie 
Instruction Register (AIR). The AIR may be written or read 
over the data bus in one cycle or written via the instruction port 
in two cycles (see Table I). The instruction contained in the 
AIR is executed whenever the AIR Enable pin is asserted or on 
the next cycle in the conditional AIR execute mode. 

WRA Write AIR with Data Bus 

AIR-^ D 

Write the AIR from the data (D) bus (D 9 . 0 ). 

RDA Read AIR at Dau Bus 

AIR 

Read the AIR over the dau (D) bus (D^.^). 

LDA Load AIR from Instruction Pon on Next Cycle 

(Requires DSEL HI) 

AIR-^ — Instruction Port 

This instruction is the first of a two-cycle sequence that loads 
the AIR via the instruction port. On the cycle following the 
execution of LDA, the instruction at the instniciion port is 
loaded inio the AIR (and not executed). DSEL must be asserted 
with the LDA instruction (meeting the same setup and hold 
lime requirements); otherwise, the AIR is not loaded. In systems 
with multiple ADSP-MlOs sharing microcode instructions, this 
feature allows you to select particular devices for AIR loading. 


Inttnictfon Group 6: MiiCcUaneoui 

VDTY Piss Data Bus to Y Port 

Y-#-~D 

Data (D) port values are supplied directly to the address (Y) 
port. Note that internal address (R) registers arc not affected by 
this instruction. 

VRE V Output Addr . Reg . in Bit-Reversed Format 

Y^YREVCR.); R.^R.-^B„ 

The selected address (R) register is bit reversed at the output 
port. The original (unrevened) R value is added to the selected 
offset (B) register, and written back into the specified R location. 
Condition testing is not performed. Bit reversing affects only 
output dau, not register contents. 

NOP No Operation 

Prevents any changes to the inicmal conditions of the AG. All 
I/O pins go to the threc-sute disable mode. 

ADDRESS GENERATOR APPLICATIONS 
The ADSP-1410 has a wide range of uses in high-speed digital 
signal processing and general purpose computer applications. In 
particular, this AG can be used in implementing the following; 

Circular Data Buffers 
~ FIR filter upped delay lines 

- Correlator delay lines 

- Image processing delay lines 

- Recirculated data I/O for transient dau capture or stimulus 
source 

Memoiy Management 

- Fast Fourier Transform data and twiddle factors 

- Matrix compuuiions 

Table Look-Ups 

Masking and uhle address mapping with AND/OR and bit 
reverse capabilities. 

Variable-Width Bit Reversing 

The inicmal bit-reversing multiplexer of the AG accommodates 
only full, 16-bit addresses (64K FFTs). For smaller FFTs, 
(utilizing a right-justified subset of the 16-bit address field), a 
zero-overhead software approach may be employed. The details 
of this approach may be found in the application note: “Variable- 
Width Bit Reversing with the ADSP-1410 Address Generator”. 
Essentially, the technique is this: an R register is intialized with 
the bit-reversed value of the 16-bii starting address (a “pre-rc- 
versed” version of the first data point location) and a B register 
with the value where K is the step size bewcen samples 

and N is the order of the FFT. Now, repeated execution of the 
YREV instruction will output the appropriate bii-rcvcrsed ad- 
dresses; updating the R register each time. 

Multi-Tasking Operations 

Context switching allowed by large number of on-chip registers 
or by instmciions allowing all registers to be saved and restored. 

16-Bil ALU/Accumulator 

By substituting external data for a B register and operating in 
posi-update mode, ALU operations can be performed at high 
speed. ALU sources are the external data and any one of sixteen 
internal R registers. Results are stored on-chip in these R registers. 
Two chips may be cascaded for double-precision operations. 
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ANALOG 

DEVICES 


64-Bit IEEE Floating-Point Chipsets 
ADSP-321 0/321 1/3220/3221 




FEATURES 

Compitt# Chipttti Boating-Point 

Arithmatic: Two MuHiptiar Options and 
Two AiU Options 

Folly Compatiblt with IEEE Standard 7S4 
Arithmatic Oparations on Four Data Formats: 
32-Brt SIngla-Pracision Floating-Point 
S4-Bit DouWa-Pracislon Boating-Point 
32-Bit Twos-Complamant Fixad-Point 
32-Bit Unsignad Fixod-Point 
Only Ona InTarnal Ptpalina Stage 
High-Spaad Pipalinad Throughput 
Singla-Pracision and Fixad-Point Multiplication 
Ratas to 20 MFIOPS 
Doubfa-Pracision MuKiplication Ratas to 
5 MFIOPS 

Singla-, Double-, and Fixad-Point 
ALU Rates to 10 MFIOPS 
Low Latency for Scalar Operations 
140ns for 32-Bit Muttipliar Operations 
31Sns for S4-Bit Multiptiar Operations 
240nff for 32-Bit ALU Operations 
2S0ns for 64-Bit ALU Operations 
IEEE Divide and Square Root {ADSP-3221 ALU) 
Flexible I/O Structures: 

AOSP-321 1/3220/3221. Either One or Two 
Input-Port Configuration Modes 
ADSP-3210; One Input Port 
7S0mW Max Power Dissipation per Chip with 
I.Spm CMOS Technology 
100-Lead Pin Grid Array (ADSP-3210 Multiplier) 
144-Lead Pin Grid Array (ADSP-321 1/3220/3221) 
Available Specified to MIL-STO-663, Class B 

APPLICATIONS 

High-Performance Digital Signal Processing 
Engineering Workstations 
Floating-Point Accelerators 
Array Processors 
Mini-supercomputers 
RISC Processors 



)\^ordSlice ^ '' Af/crocoded System 
with AD$P^321 0321 132203221 


The high throughput of these CJMOS chips is achieved with 
only a single level of internal pipelining, greatly simplifying 
program development. Theoretical MFLOPS rates are much 
easier to approach in actual systems with this chip architecture 
than with alternative, more heavily pipelined chipsets. Also, the 
minimal internal pipelining in the ADSP-3210321 13220/322 1 
results in very low latency, important in scalar processing and in 
algorithms with data dependencies. To further reduce latency, 
input registers can be read into the chips internal computational 
circuits at the rising edge that loads them from the input port 
(formerly called direct operand feed). 

In conforming to IEEE Standard 754, these chips assure complete 
software portability for computational algorithms adhering to 
the Standard. All four rounding modes are supported for all 
floating-point data formats and conversions. Five IEEE exception 
conditions ~ overflow, underflow, invalid operation, inexact 
result, and division by zero - arc available externally on status 
pins. The IEEE gradual underflow provisions are also supported, 
with special instructions for handling dcnormals. Alternatively, 
each chip offers a FAST mode which sets results less than the 
smallest IEEE normalized values to zero, thereby eliminating 
underflow’ exception handling when full conformance to the 
Standard is not essential. 


GENERAL DESCRIPTION 

The A0SP-321O 3211 Floating-Point Multipliers and the 
A0SP- 3220 322 1 Floating-Point ALUs arc high-speed, low-power 
arithmetic processors conforming to IEEE Standard 754. A 
chipset consisting of cither Multiplier used with cither ALU 
contains.ihc basic computational elements for implementing a 
high-speed numeric processor. Operations are supported on four 
data formats: 32-bit IEEE single-precision floating-point, 64-bii 
IEEE double-precision floating-point, 32-bii iwos-complemeni 
fixed-point, and 32-bit unsigned-magnitude fixed-point. 


The instruction sets of the ADSP-32 1 0 32 1 1 /3220 322 1 are oriented 
to system-level implementations of function calculations. Specific 
instructions are included to facilitate such operations as floaiing- 
poini division and square root, table lookup, quadrant normali- 
zation for trig functions, extended-precision integer operations, 
logical operations, and conversions between all data formats. 

The ADSP-3210 Floating-Point Multiplier is a one input- and 
one output-port device with four input registers. The ADSP-321 1 
Floating-Point Multiplier adds a second input port and doubles 
the number of input registers to eight. It executes all ADSP-3210 


Word-SUce is a trademark of Analog Devices, Inc. 
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STATUS FLAGS 

The ADSP-32 10/32 11/3220/3221 chipKi gcntritcton dedietted 
pins the following exception Higs ipccificd in the IEEE Sundird: 
Overfiow (OVRFLO), Underflow (UNDFLO)» Inexact Result 
(INEXO), and Invalid Opcriiion (INVALOP). The IEEE ex- 
ception condition Division-by-Zero is flagged by the simultaneous 
assertion of both OVRFLO and INVALOP pins. The five 
IEEE exceptions arc defined in accordance to the default as- 
sumption of Std 754 of nontrapping exceptions. 

These four flag results art registered in the Status Output Register 
when the results they reflect are clocked to the Output Register. 
They arc held valid until the next rising clock edge. The IEEE 
Standard specifics that exception flags when set remain set until 
reset by the user. For full conformance to the standard, the 
ftatus outputs from this chipset should be individually latched 
externally. 

Denormal Input 

In addition to the IEEE status flags, the ADSP-3210/32n Mul- 
tipliers have a DENORM output flag that sigruls the presence 
of a denormalized number at one of the input registers being 
read into the multiplier array. This dcnormal must be wrapped 
by the ALU before the Multiplier can read it. To minimize the 
system response time to a dcnormal input exception, the DE- 
NORM flag comes out earlier than the associated IEEE status 
flags. DENORM is normally in an indeterminate state. For 
single-precision multiplications, DENORM goes HI during the 
cycle after a dcnormal was read into the array (with the RDA^B 
controls). Sec Figure T4. For double-precision multiplications, 
DENORM goes HI during the third cycle after a dcnormal was 
read inio the multiplier array. Sec Figure T5. Both Multipliers 
produce ZERO results under these conditions. The DENORM 
flag is assened in both IEEE and FAST modes. 

Some multiplications with dcnormal operands do not require 
wrapping and therefore do not cause the assertion of the DENORM 
flag. These arc DNRM-ZERO, DNRMriNF, and DNRM*NAN. 
Multiplication of a finite number by zero always yields zero - 
the result the Multiplier will produce anyway - so there is no 
need to signal an exception. Any finite number multiplied by 
INF should yield INF, and the ADSP-32I0/32II Multipliers 
will produce this result with a DNRM operand, hence no wrapping 
is required. And muliiplicaiion of any number by a NAN produces 
a NAN (and the INVALOP flag); no wrapping is necessary for 
the Multipliers to produce this correct IEEE result. 

Note that the ALUs in general operate directly on dcnormals 
and therefore do not flag any exception. The ADSP-3221 ALU, 
however, cannot 0 {>crate directly on dcnormals in its division 
and square root operations. For these operations, dcnormal 
inputs will cause the simultaneous assertion of UKDFLO and 
INVALOP in IEEE mode. For divisions, INEXO HI indicates 
that the dividend is a DKRM; INEXO LO indicates that the 
divisor or both operands arc DNRMs. In FAST mode, only 
INV^ALOP will be asserted. This denormal exception information 
becomes available with the status outputs, i.e., at the end of an 
aiiempicd multicycle division or square root. In both modes for 
both division and square root, a properly signed all-ones NAN 
will be produced. 

Invalid Operation and NAN results 
INVALOP is generated whenever attempting to execute an 
invalid operation, as defined in Std 754 Section 7.1. The 
INVALOP output is also used in conjunction with other pins to 
indicate the Division-by-Zero exception and denormal divisor or 
dividend. The default nontrapping result is required to be a 
quiet NAN. Except when passing a NAN with PASS or copying 
a sign bit to a NAN, the ADSP-3210/32n/3220/322I chipset 


will always produce a NAN with an exponent and fraction of all 
ones as a result of an invalid operation. 

Conditions that cause the assertion of INVALOP arc: 

• NAN input read to computational circuitry (except for logical 
PASS) 

• Multiplication of either ± INF by cither ± ZERO 

• In FAST mode, multiplication of either ± INF by cither 
±DNRM 

• Subtraction of liked-signed INFs or addition of opposite-signed 
INFs 

• Conversion of a NAN or INF to fixed-point 

• Wrapping an operand that is neither a dcnormal nor ZERO 

• Division of cither ± ZERO by cither ±ZERO or of cither 
± INF by cither ± INF 

• Attempting the square root of a negative number 

• In conjunction with OVRFLO, the Division-by-Zero 
exception 

• In FAST mode, a dcnormal divisor or dividend. In IEEE 
mode, in conjunction with UNDFLO, a dcnormal divisor or 
dividend 

• In conjunction with UNDFLO, a dcnormal input operand to 
square root 

Division-by-Zero 

The Division-by-2l^ro exception is generated whenever attempting 
to divide a finite non-zero dividend by a divisor of zero (Std 754 
Section 7.2). The Division-by-Zero exception is indicated on the 
ADSP-3221 ALU by the simultaneous assertion of both OVRFLO 
and INVALOP. The ALU result is always a correctly signed 
INF. 

Overflow 

OVRFLO is generated whenever the unbounded (i.e., supposing 
hypothetically no bounds on the exponent range of the result), 
post-rounded result exceeds in magnitude NORM. MAX in the 
destination format, as defined in Std 754 Section 7.3. Note that 
the overflow condition can occur both during compulations and 
during data format conversions. The result will be cither ± INF 
or ± NORM. MAX, depending on the sign of the result and the 
operative rounding mode. (Sec “Rounding - RND Cxmtrols” 
above.) The OVRFLO pin is also used to signal additional 
exception conditions. 

Conditions that cause the assertion of OVRFLO are; 

• Unbounded, post-rounded result exceeds destination format 
in computation or conversion 

• In conjunction with INVALOP, the Division-by-Zero exception 
on the ADSP-3221 ALU 

• Comjwrison when operand A is greater than operand B 

• Exjxinent subtraction when the resultant exponent is more 
positive than can be represented in the destination format 

• Twos-complement fixed-point additions and subtractions that 
overflow 

Note that OVRFLO is always LO when the ADSP-32IO'321 1 
Multipliers are in fixed-point mode. 

Underflow 

Underflow is defined in four ways in Std 754 Section 7.4. The 
IEEE Standard allows the implemenier to chose which definition 
of underflow to use and provides no guidance. The first option 
is whether to flag underflow based on results before or after 
rounding. Consisient with the definition of overflow, underflow 
is always flagged with this chipset based on results after rounding 
(except for the operations of conversion from floating-point to 
fixed-point and logical downshifts). Thus, a result whose infinitely 
precise value is less than NORM. MIN yet which rounds to 
NORM. MIN will not be considered to have underflowed. 


\it 
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The iccond option it how to interpret whit the Stindtrd cills 
tn •‘cxtnordiniry k>»t of accuracy.” The firii way is in terms of 
the creation of non-aero, pott-rounded numbers smaller in mag* 
nitude than NORM. MIN. The second way is in terms of loss of 
.ccuracy when representing numbers at dcnormals. With the 
ADSP-3210/B21 1 / 3220/3221 chipset, the conditions under which 
UNDFLO is asserted depend on whether the chip in question 
can generate dcnormals in its current operating mode. If the 
chip cannot generate dcnormals, the definition in terms of numbers 
smaller in magnitude than NORM.MIN will apply; if it can 
generate dcnormals, the definition in terms of inexact dcnormals 
will apply. Thus, which definition applies will depend on whether 
the chipset is operating in IEEE or FAST mode, whether its 
result is generated by a Multiplier or an ALU, and whether the 
operation is division. 

W'lth the ADSP-32 10/321^1 Multipliers, UNDFLO is generated 
whenever the unbounded, post-rounded, non-zero result is of 
lesser magnitude than NORM.MIN in the destination format, 
both in FAST and IEEE modes. In FAST mode, the data result 
will be ZERO; in IEEE mode the data result will be in the 
wrapped format. An exact ZERO result will never cause the 
isscriion of UNDFLO. 

With the ADSP-3220/3221 ALUs in the FAST mode, UNDFLO 
is also generated whenever the unbounded, post-rounded, non-zero 
result is of lesser magnitude than NORM.MIN in the destination 
format for standard ALU operations as well as for division and 
square root. For FAST mode underflows, the ALU result will 
always be ZERO. 

With the ADSP-3220/3221 ALUs in IEEE mode, UNDFLO is 
generated (except for divisions) whenever the unbounded, infi- 
nitely precise (i.c., supposifig'hypoihciicaJly no bounds on the 
precision of the result), post -rounded result is a dcnormal and 
docs not fit into the dcnormal destination format without a loss 
of accuracy. In other words, UNDFLO will be generated whenever 
an inexact dcnormal result is produced. (See “Inexact” below.) 

If the result is a dcnormal and docs fit exactly, neither UNDFLO 
nor INEXO will be asserted. Note that additions, subtractions, 
and comparisons cannot generate this underflow condition (since 
no operand contains significant bits of lesser magnitude than 
DKRM.MIN). IEEE-mode ALU underflow exceptions occur 
only during conversions and divisions. 

The division operation is treated like a multiplication operation 
in IEEE mode rather than an ALU operation in the definition 
of underflow. A quotient from division smaller in magnitude 
than NORM.MIN will always be flagged as underflowed with 
the ADSP-3221 ALU. The data result will be in the wrapped 
format. Note that \ (DNRM.MIN)3:NORM.MIN. Therefore, 
square root will never underflow with operands greater than or 
equal to DNRM.MIN. 

Conditions that cause the assertion of UNDFLO arc: 

• With the ADSP-3210 3211 Multipliers, whenever the un- 
bounded, post-rounded, non-zero result is of lesser magnitude 
than NORM.MIN in the destination format 

• With the ADSP-3220 3221 ALUs in the FAST mode, whenever 
the unbounded, post-rounded, non-zero result is of lesser 
magnitude than NORM.MIN in the destination format 

• W'ith the ADSP-3220/3221 ALUs in IEEE mode, whenever 
an inexact dcnormal is produced or whenever the unbounded, 
post-rounded, non-zero quotient from division is of lesser 


magnitude than NORM.MIN in the destination format 

• Conversions to integer if the magnitude of the floating-point 
lourcc before rounding is less than one 

• Conversions from DP floating-point to SP floating-point 
whenever the unbounded, post-rounded, non-zero result is 
less than SP DNRM.MIN or whenever tn inexact dcnormal 
is produced. 

• Comparison when operand A is less than operand B 

• Attempting to wrap a ZERO 

• Unwrapping if there is a loss of accuracy 

• Exftonent subtraction when iht resultant exponent is more 
negative than can be represented in the destination format 

• Logical downshift that before rounding would have shifted all 
bits out of the destination format 

• In conjunction with INVALOP, a dcnormal divisor or 
dividend 

• A quotient from division less than NORM.MIN 

• In IEEE mode, in conjunction with INVALOP, a dcnormal 
input operand for square root 

Inexact 

The inexact exception is defined in Sid 754 Section 7.5 as the 
loss of accuracy of the unbounded, infinitely precise result when 
fitted to the destination formal. It is signalled on the ADSP-3210 
3211/3220/3221 chipset by INEXO. 

For fixed-point operations, the ADSP-3210 3211 Multipliers will 
assen INEXO HI if and only if any of the least-significant 32 
bits of prerounded 64-bii products arc ones. They never assen 
INEXO for logical operations. The ADSP-3220 3221 ALUs 
never assert INEXO for fixed-point or logical operations. 

In an ADSP-3221 division opieration, either a denormal divisor 
or a dcnormal dividend will cause the simultaneous assertion of 
UNDFLO and INVALOP. INEXO will, in that context, signal 
which of the two was the denormal: INEXO LO indicates that 
the divisor is a denormal; INEXO HI indicates that the dividend 
is a denormal. 

Conditions that cause the assenion of INEXO arc: 

• Loss of accuracy when filling result to destination format 

• For fixed-point operations, the prerounded multiplier 64-bit 
product contains ones in the least-significant 32 bits 

• In IEEE mode, in conjunction with both UNDFLO and 
INVALOP, dividend is a dcnormal (HI) or divisor is a denormal 
or both are dcnormals (LO) 

Less Than, Equal, Greater Than, and Unordcred 
For comparison operations in the ALUs, the 0\ RFLO, 
UNDFLO, and INVALOP status outputs are used to indicate 
the four comparison conditions of IEEE Sid 754, Section 5.7. 
They arc defined as follows: 

• “Less than” is signalled by the assertion of UNDFLO (while 
OVRFLO is LO) 

• “Equal” is signalled by not asserting either OVRFLO or 
UNDFLO (i.c., both LO) 

• “Greater than” is signalled by the assertion of OVRFLO 
(while UNDFLO is LO) 

• “Unordered” is signalled by the assenion of INVALOP, ^ 
caused by attempting a comparison with at least one NAN 
operand 

The data result from a comparison operation is identical to 
subtracting operand B from operand A. See Tables XI and 
Xll. 
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In IEEE compariBons, ihc data types arc tlwayi ordered in 
•acending sequence: -INF, -NORM, «DRNM,ZERO, 
DNRM, NORM, and INF. Comparisons between like ugned 
INFs will generate the “Equal” status condition. Comparisons 
between signed ZEROs will also generate the “Equal” lutus. 
Any comparison to a NAN will also cause INVALOP and produce 
an all-ones NAN. Even in FAST mode, DNRMs will be compared 
based on their true value (rather than all being treated as 
ZEROs). 

Special Flags for Unwrapping 

The ADSP-3210'321 1 generates a Round Carry’ Propagation Out 
flag, RNIXIARO, that indicates whether or not a carry bit 
propagated into the destination format’s fraction during the 
Multiplier’s floating-point rounding operation. The rounding 
that the Multiplier docs in creating the wrapped or unnormal 
result may cause a carry bit into the LSB in the destinations 
format’s fraction. This rounding position will not in general be 
correct for a properly rounded dcnormal. Thus, when the un- 
derflowed Multiplier result is unwrapped to a dcnormal, the 
ALU has to undo the Multiplier’s rounding and re-round to 
achieve the properly rounded dcnormal. 

To do this, the ALU has to know if any carry bits in the Multipliers 
rounding operation propagated into the fraction of the result. 
This information is provided in the Multiplier's RNDCARO 
flag. The ALU also needs to know if the Multiplier’s rounded 
result caused a loss of accuracy when expressed in its destination 
wrapped format, indicated by the Multipliers Inexact Result 
(INEXO) flag. 

The ADSP-3220 3221 ALUs have a corresponding pair of flag 
status input pins: Round Carry Propagation In (RNDCARI) 
and Inexact Data In (INEXIN). In an unwrap operation, these 
flags arc used by the ALU when converting from a WNRM to a 
DNRM to obtain the properly rounded result. RNDCARI and 
INEXIN should be setup to the ALU with the instruction for 
the unwrap operation. Both Multiplier and ALU must be using 
the same rounding mode. 

The ADSP-3221 ALU itself generates WNRMs in underflowed 
division operations. These WNRMs must be fed back to the 
ALU to be unwrapped to DNRMs. The ADSP-3221, unlike the 
Multipliers, does not have a RNDCARO pin to signal whether 
or not a carry bit propagated into the destination formal on 
rounding. For this reason, WNRMs produced by the ADSP-3221 
ALU in division arc rounded differently than they are on the 
Multipliers; underflowed (only) quotients arc always truncated 
(Round-ioward-Zero) to the destination wrapped format. Hence 


there is no carry bit propagiiion. When unwrapping m WNRM 
produced in division, RNDCARI should always be held LO. 
INEXIN should reflect the lUtus of INEXO when tt»c ALU 
produced the underflowed wrapped quotient. 

The ADSP.3221 ALU also uses the RNDCARI and iNEXiN 
pins to indicated wrapped A and B operands, rcspccti'vciy, to 
division and square root operations. Both RNE>CAR1 and INEXIN 
ihould be held LO except for unwrap, division, and Kjuarc root 
operations. 

INSTRUCTIONS AND OPERATIONS 
The ADSP-3210'321 1 Multipliers execute the same msiniction 
every cycle: multiply. It need not be specified explicitly’ in micro- 
code . The data format of results and status flags from multiplication 
arc shown in Tables IX and X. Note that double-precision 
floating-point multiplications arc multicycle operations. Data 
must be available in the input registers as shown above in 
Figure 23. 

Dcnormal input operands will generally cause the DENORM 
exception (see “Status Flags” above) and correctly sigs^ed ZERO 
results. FAST mode suppresses the DENORM cxccpoon. In 
either FAST or IEEE, DNRM-ZERO will be ZERO without 
exception. DNRM»1NF will be a correctly signed INF without 
exception in IEEE mode and a NAN and IN\^ALOP in FAST 
mode. DNRM*NAN w'ill be a correctly signed NAN with IN- 
VALOP asserted. The sign bit of the NAN generated from any 
invalid operation will depend on the operands. (The IEEE Standard 
docs not specify conditions for the sign bit of a NAN. ) On the 
ADSP-32103211 Multipliers, the sign of a NAN result will be 
the exclusive OR of the signs of the input operands. 

The product of INF with anything except ZERO or KAN is a 
correctly signed INF. INF*ZERO w’ill cause INVALOP and 
yield a NAN. NAN times anything will also cause INVALOP 
and yield a NAN. 

The ADSP-3220/3221 ALUs, in contrast to the Multipliers, are 
instruction driven with the operation specified by lg_o. *Phe 
ALU instructions fall into four categories: Fixed-Point, Logical, 
Single-Precision Floating-Point, and Double-Precision Floaiing- 
Point. Instructions are summarized in Tables V through VIII 
and described in this section below. The data format of results 
and status flags from the various ALU operations are shown in 
Tables XI and XIL Division is shown in Tables XIII and XIV; 
square root in Table XV. Conversions are illustrated in Tables 
XVI, XVII, and XVIII. 

The ADSP-3220 3221 Fixed-Point Arithmetic Operations are: 


Mnemonic 

Instruction (Ig-o) 

l8-« I5-3 I2-0 

Description 

lADD 

001 

000 

on 

Fixed-point A + B 

ISUBB 

001 

001 

oil 

Fixed-point A- B 

ISUBA 

001 

000 

111 

Fixed-poiniB- A 

lADDWC 

001 

010 

on 

Fixed-point A + B with carry 

ISUBWBB 

001 

oil 

on 

Fixed-point A - B with borrow 

ISUBWBA 

001 

010 

111 

Fixed-point B - A with borrow 

INEGA 

001 

000 

101 

Fixed-point — A. ABS AB must be LO. 

INEGB 

001 

001 

010 

Fixed-point — B. ABSA/B must be LO. 

lADDAS 

001 

100 

on 

Fixed-point |A+B| 

ISUBBAS 

001 

101 

on 

Fixed-point | A - B| ABSA/B must be LO. 

ISUBAAS 

001 

100 

in 

Fixed-point [B — Aj ABSA/B must be LO. 


Table V. ADSP-3220/3221 Fixed-Point ALU Operations 
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jic ADSP* 3 I 20/|221 Logicil Opcritionsirc: 


Mnemonic 


COMPLA 

COMPLB 

PASSA 

PASSB 

AANDB 

AORB 

AXORB 

NOP 


CLR 


Instruction (I*_o) 



Is? 

b-0 

000 

000 

101 

000 

001 

010 

000 

ooo 

001 

000 

000 

010 

000 

010 

010 

000 

100 

010 

000 

no 

010 

000 

000 

000 

100 

000 

000 


Description 

Onc&*complcmcni A 
Oncs-complcmcnt B 
Pass A unmodified . Sci no flags. 

Pass B unmodified. Set no flags. 

Bitwise logical AND 
Bitwise logical OR 
Bitwise logical XOR 

No operation. Preserve status flags. Preserve Output Register 
contents with ADSP- 322 1 only. 

Clear all status flags. Data register contents are unaffected. 


Table VI. ADSP-3220 3221 ALU Logical Operations 


The A DSP -3220 3221 Single-Precision Floating-Point Operations are: 


Mnemonic 

Instructioo (Ig-o) 

I«-6 I5-3 b-o 

Description 

SADD 

111 

000 

on 

SPFligPt(A + B) 

SSUBB 

111 

000 

111 

SPFligPt(A-B) 

SSUBA 

111 

001 

on 

SPFligPt(B~A) 

SCOMP 

111 

001 

in 

SP FltgPt comparison of A to B. Result is (A - B) 

Greater Than =: (OV’RFLO HI) 

Equal-(OVRFLO LO & UNDFLO LO) 

Uss Than = (UNDFLO HI) 

Unordered = INVALOP HI 

SADDAS 

on 

000 

on 

SPFligPi|A + Bi 

SSUBBAS 

oil 

000 

in 

SP FltgPt |A-Bi 

SSUBAAS 

oil 

001 

on 

SPFltgPt|B~A! 

SFIXA 

on 

001 

101 

Convert SP FltgPt A to iwos-complement Integer 

SFIXB 

on 

001 

110 

Convert SP FltgPt B to twos-complement Integer 

SFLOATA, 

on 

100 

101 

Conven iwos-complcment integer A to SP FltgPt 

SFLOATB 

on 

100 

no 

Convert twos-complement integer B to SP FltgPt 

DOUBLEA 

on 

101 

101 

Convert SP FltgPt A to DP FligPi 

DOUBLEB 

on 

101 

no 

Convert SP FltgPt B to DP FltgPt 

SPASSA 

on 

no 

001 

Pass SP FltgPt A. NANs cause INX^ALOP. 

SPASSB 

on 

no 

010 

Pass SP FltgPt B . NANs cause INVALOP. 

SWRAPA 

on 

100 

001 

Wrap SP DNRM A to SP WNRM 

SWRAPB 

on 

100 

010 

Wrap SP DNRM B to SP WNRM 

SUKWRAPA 

on 

010 

001 

Unwrap SP WNRM A to SP DNRM 

SUN'^’R.APB 

on 

010 

010 

Unwrap SP WNRM B to SP DNRM 

SSIGN 

on 

in 

101 

Copy sign from SP FltgPt B to SP FltgPt A. Result is 
[sign B, exponent A, fraction A]. 

SXSUB 

on 

in 

001 

Subtract B exponent from A exponent . Result is 
[sign A, (expt A - expt B), fraction A] for all data types. 

If the unbiased exponent ^ + 128 , INF results. 

If the unbiased exponent is - 127 , ZERO results. 

SITRN 

on 

010 

101 

Downshift SP FltgPt A mantissa (with hidden bit) logically by the 
unbiased SP FltgPt B exponent to a 32 -bit 
unsigned-magnitude integer. Use RZ only. 

ADSP -3221 ALU only: 

SDIV 

on 

no 

111 

SP FltgPt (A -B) 

SSQR 

in 

no 

no 

SP FltgPt 


Table VII. ADSP-3220 3221 ALU Single-Precision Floating-Point Operations 
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